|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<Clustering<Model>> de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.CASH
@Title(value="CASH: Robust clustering in arbitrarily oriented subspaces") @Description(value="Subspace clustering algorithm based on the hough transform.") @Reference(authors="E. Achtert, C. B\u00f6hm, J. David, P. Kr\u00f6ger, A. Zimek", title="Robust clustering in arbitraily oriented subspaces", booktitle="Proc. 8th SIAM Int. Conf. on Data Mining (SDM\'08), Atlanta, GA, 2008", url="http://www.siam.org/proceedings/datamining/2008/dm08_69_AchtertBoehmDavidKroegerZimek.pdf") public class CASH
Provides the CASH algorithm, an subspace clustering algorithm based on the hough transform.
Reference: E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek: Robust
clustering in arbitrarily oriented subspaces.
In Proc. 8th SIAM Int. Conf. on Data Mining (SDM'08), Atlanta, GA, 2008
Nested Class Summary | |
---|---|
static class |
CASH.Parameterizer
Parameterization class. |
Field Summary | |
---|---|
private boolean |
adjust
Holds the value of ADJUST_ID . |
static OptionID |
ADJUST_ID
Flag to indicate that an adjustment of the applied heuristic for choosing an interval is performed after an interval is selected. |
private Relation<ParameterizationFunction> |
fulldatabase
The entire database |
private double |
jitter
Holds the value of JITTER_ID . |
static OptionID |
JITTER_ID
Parameter to specify the maximum jitter for distance values, must be a double greater than 0. |
private static Logging |
logger
The logger for this class. |
private int |
maxLevel
Holds the value of MAXLEVEL_ID . |
static OptionID |
MAXLEVEL_ID
Parameter to specify the maximum level for splitting the hypercube, must be an integer greater than 0. |
private int |
minDim
Holds the value of MINDIM_ID . |
static OptionID |
MINDIM_ID
Parameter to specify the minimum dimensionality of the subspaces to be found, must be an integer greater than 0. |
private int |
minPts
Holds the value of MINPTS_ID . |
static OptionID |
MINPTS_ID
Parameter to specify the threshold for minimum number of points in a cluster, must be an integer greater than 0. |
private int |
noiseDim
Holds the dimensionality for noise. |
private ModifiableDBIDs |
processedIDs
Holds a set of processed ids. |
Constructor Summary | |
---|---|
CASH(int minPts,
int maxLevel,
int minDim,
double jitter,
boolean adjust)
Constructor. |
Method Summary | |
---|---|
private MaterializedRelation<ParameterizationFunction> |
buildDB(int dim,
Matrix basis,
DBIDs ids,
Relation<ParameterizationFunction> relation)
Builds a dim-1 dimensional database where the objects are projected into the specified subspace. |
private Database |
buildDerivatorDB(Relation<ParameterizationFunction> relation,
CASHInterval interval)
Builds a database for the derivator consisting of the ids in the specified interval. |
private Database |
buildDerivatorDB(Relation<ParameterizationFunction> relation,
DBIDs ids)
Builds a database for the derivator consisting of the ids in the specified interval. |
private Matrix |
determineBasis(double[] alpha)
Determines a basis defining a subspace described by the specified alpha values. |
private double[] |
determineMinMaxDistance(Relation<ParameterizationFunction> relation,
int dimensionality)
Determines the minimum and maximum function value of all parameterization functions stored in the specified database. |
private CASHInterval |
determineNextIntervalAtMaxLevel(Heap<IntegerPriorityObject<CASHInterval>> heap)
Determines the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed objects. |
private CASHInterval |
doDetermineNextIntervalAtMaxLevel(Heap<IntegerPriorityObject<CASHInterval>> heap)
Recursive helper method to determine the next ''best'' interval at maximum level, i.e. the next interval containing the most unprocessed objects |
private Clustering<Model> |
doRun(Relation<ParameterizationFunction> relation,
FiniteProgress progress)
Runs the CASH algorithm on the specified database, this method is recursively called until only noise is left. |
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. |
protected Logging |
getLogger()
Get the (STATIC) logger for this class. |
private void |
initHeap(Heap<IntegerPriorityObject<CASHInterval>> heap,
Relation<ParameterizationFunction> relation,
int dim,
DBIDs ids)
Initializes the heap with the root intervals. |
private ParameterizationFunction |
project(Matrix basis,
ParameterizationFunction f)
Projects the specified parameterization function into the subspace described by the given basis. |
Clustering<Model> |
run(Database database,
Relation<ParameterizationFunction> relation)
Run CASH on the relation. |
private Matrix |
runDerivator(Relation<ParameterizationFunction> relation,
int dim,
CASHInterval interval,
ModifiableDBIDs ids)
Runs the derivator on the specified interval and assigns all points having a distance less then the standard deviation of the derivator model to the model to this model. |
private LinearEquationSystem |
runDerivator(Relation<ParameterizationFunction> relation,
int dimensionality,
DBIDs ids)
Runs the derivator on the specified interval and assigns all points having a distance less then the standard deviation of the derivator model to the model to this model. |
private double |
sinusProduct(int start,
int end,
double[] alpha)
Computes the product of all sinus values of the specified angles from start to end index. |
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm |
---|
makeParameterDistanceFunction, run |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm |
---|
run |
Field Detail |
---|
private static final Logging logger
public static final OptionID MINPTS_ID
Key: -cash.minpts
public static final OptionID MAXLEVEL_ID
Key: -cash.maxlevel
public static final OptionID MINDIM_ID
Default value: 1
Key: -cash.mindim
public static final OptionID JITTER_ID
Key: -cash.jitter
public static final OptionID ADJUST_ID
Key: -cash.adjust
private int minPts
MINPTS_ID
.
private int maxLevel
MAXLEVEL_ID
.
private int minDim
MINDIM_ID
.
private double jitter
JITTER_ID
.
private boolean adjust
ADJUST_ID
.
private int noiseDim
private ModifiableDBIDs processedIDs
private Relation<ParameterizationFunction> fulldatabase
Constructor Detail |
---|
public CASH(int minPts, int maxLevel, int minDim, double jitter, boolean adjust)
minPts
- MinPts parametermaxLevel
- Maximum levelminDim
- Minimum dimensionalityjitter
- Jitteradjust
- AdjustMethod Detail |
---|
public Clustering<Model> run(Database database, Relation<ParameterizationFunction> relation)
database
- Databaserelation
- Relation
private Clustering<Model> doRun(Relation<ParameterizationFunction> relation, FiniteProgress progress) throws UnableToComplyException, ParameterException, NonNumericFeaturesException
relation
- the Relation to run the CASH algorithm onprogress
- the progress object for verbose messages
UnableToComplyException
- if an error according to the database
occurs
ParameterException
- if the parameter setting is wrong
NonNumericFeaturesException
- if non numeric feature vectors are usedprivate void initHeap(Heap<IntegerPriorityObject<CASHInterval>> heap, Relation<ParameterizationFunction> relation, int dim, DBIDs ids)
heap
- the heap to be initializedrelation
- the database storing the parameterization functionsdim
- the dimensionality of the databaseids
- the ids of the databaseprivate MaterializedRelation<ParameterizationFunction> buildDB(int dim, Matrix basis, DBIDs ids, Relation<ParameterizationFunction> relation) throws UnableToComplyException
dim
- the dimensionality of the databasebasis
- the basis defining the subspaceids
- the ids for the new databaserelation
- the database storing the parameterization functions
UnableToComplyException
- if an error according to the database
occursprivate ParameterizationFunction project(Matrix basis, ParameterizationFunction f)
basis
- the basis defining he subspacef
- the parameterization function to be projected
private Matrix determineBasis(double[] alpha)
alpha
- the alpha values
private double sinusProduct(int start, int end, double[] alpha)
start
- the index to startend
- the index to endalpha
- the array of angles
private CASHInterval determineNextIntervalAtMaxLevel(Heap<IntegerPriorityObject<CASHInterval>> heap)
heap
- the heap storing the intervals
private CASHInterval doDetermineNextIntervalAtMaxLevel(Heap<IntegerPriorityObject<CASHInterval>> heap)
heap
- the heap storing the intervals
private double[] determineMinMaxDistance(Relation<ParameterizationFunction> relation, int dimensionality)
relation
- the database containing the parameterization functions.dimensionality
- the dimensionality of the database
private Matrix runDerivator(Relation<ParameterizationFunction> relation, int dim, CASHInterval interval, ModifiableDBIDs ids) throws UnableToComplyException, ParameterException
relation
- the database containing the parameterization functionsinterval
- the interval to build the modeldim
- the dimensionality of the databaseids
- an empty set to assign the ids
UnableToComplyException
- if an error according to the database
occurs
ParameterException
- if the parameter setting is wrongprivate Database buildDerivatorDB(Relation<ParameterizationFunction> relation, CASHInterval interval) throws UnableToComplyException
relation
- the database storing the parameterization functionsinterval
- the interval to build the database from
UnableToComplyException
- if an error according to the database
occursprivate LinearEquationSystem runDerivator(Relation<ParameterizationFunction> relation, int dimensionality, DBIDs ids)
relation
- the database containing the parameterization functionsids
- the ids to build the modeldimensionality
- the dimensionality of the subspace
private Database buildDerivatorDB(Relation<ParameterizationFunction> relation, DBIDs ids) throws UnableToComplyException
relation
- the database storing the parameterization functionsids
- the ids to build the database from
UnableToComplyException
- if initialization of the database is not
possiblepublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<Model>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<Model>>
|
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |