V
- the type of NumberVector handled by this Algorithm@Title(value="DiSH: Detecting Subspace cluster Hierarchies") @Description(value="Algorithm to find hierarchical correlation clusters in subspaces.") @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek", title="Detection and Visualization of Subspace Cluster Hierarchies", booktitle="Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007", url="http://dx.doi.org/10.1007/978-3-540-71703-4_15") public class DiSH<V extends NumberVector> extends AbstractAlgorithm<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek:
Detection and Visualization of Subspace Cluster Hierarchies.
In Proc. 12th International Conference on Database Systems for Advanced
Applications (DASFAA), Bangkok, Thailand, 2007.
Modifier and Type | Class and Description |
---|---|
static class |
DiSH.DiSHClusterOrder
DiSH cluster order.
|
private class |
DiSH.Instance
OPTICS variant used by DiSH internally.
|
static class |
DiSH.Parameterizer<V extends NumberVector>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private DiSHPreferenceVectorIndex.Factory<V> |
dishPreprocessor
The DiSH preprocessor.
|
private double |
epsilon
Holds the value of
DiSH.Parameterizer.EPSILON_ID . |
private static Logging |
LOG
The logger for this class.
|
private int |
mu
OPTICS minPts parameter.
|
Constructor and Description |
---|
DiSH(double epsilon,
int mu,
DiSHPreferenceVectorIndex.Factory<V> dishPreprocessor)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private void |
buildHierarchy(Relation<V> database,
Clustering<SubspaceModel> clustering,
List<Cluster<SubspaceModel>> clusters,
int dimensionality)
Builds the cluster hierarchy.
|
private void |
checkClusters(Relation<V> relation,
TCustomHashMap<long[],List<ArrayModifiableDBIDs>> clustersMap)
Removes the clusters with size < minpts from the cluster map and adds them
to their parents.
|
private Clustering<SubspaceModel> |
computeClusters(Relation<V> database,
DiSH.DiSHClusterOrder clusterOrder)
Computes the hierarchical clusters according to the cluster order.
|
private TCustomHashMap<long[],List<ArrayModifiableDBIDs>> |
extractClusters(Relation<V> relation,
DiSH.DiSHClusterOrder clusterOrder)
Extracts the clusters from the cluster order.
|
private Pair<long[],ArrayModifiableDBIDs> |
findParent(Relation<V> relation,
Pair<long[],ArrayModifiableDBIDs> child,
TCustomHashMap<long[],List<ArrayModifiableDBIDs>> clustersMap)
Returns the parent of the specified cluster
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private boolean |
isParent(Relation<V> relation,
Cluster<SubspaceModel> parent,
Hierarchy.Iter<Cluster<SubspaceModel>> iter,
int db_dim)
Returns true, if the specified parent cluster is a parent of one child of
the children clusters.
|
Clustering<SubspaceModel> |
run(Database db,
Relation<V> relation)
Performs the DiSH algorithm on the given database.
|
private List<Cluster<SubspaceModel>> |
sortClusters(Relation<V> relation,
TCustomHashMap<long[],List<ArrayModifiableDBIDs>> clustersMap)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in
descending order.
|
private int |
subspaceDimensionality(NumberVector v1,
NumberVector v2,
long[] pv1,
long[] pv2,
long[] commonPreferenceVector)
Compute the common subspace dimensionality of two vectors.
|
protected static double |
weightedDistance(NumberVector v1,
NumberVector v2,
long[] weightVector)
Computes the weighted distance between the two specified vectors according
to the given preference vector.
|
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
private double epsilon
DiSH.Parameterizer.EPSILON_ID
.private DiSHPreferenceVectorIndex.Factory<V extends NumberVector> dishPreprocessor
private int mu
public DiSH(double epsilon, int mu, DiSHPreferenceVectorIndex.Factory<V> dishPreprocessor)
epsilon
- Epsilon valuemu
- Mu parameter (minPts)dishPreprocessor
- DiSH preprocessorpublic Clustering<SubspaceModel> run(Database db, Relation<V> relation)
relation
- Relation to processprivate Clustering<SubspaceModel> computeClusters(Relation<V> database, DiSH.DiSHClusterOrder clusterOrder)
database
- the database holding the objectsclusterOrder
- the cluster orderprivate TCustomHashMap<long[],List<ArrayModifiableDBIDs>> extractClusters(Relation<V> relation, DiSH.DiSHClusterOrder clusterOrder)
relation
- the database storing the objectsclusterOrder
- the cluster order to extract the clusters fromprivate List<Cluster<SubspaceModel>> sortClusters(Relation<V> relation, TCustomHashMap<long[],List<ArrayModifiableDBIDs>> clustersMap)
relation
- the database storing the objectsclustersMap
- the mapping of bits sets to clustersprivate void checkClusters(Relation<V> relation, TCustomHashMap<long[],List<ArrayModifiableDBIDs>> clustersMap)
relation
- the relation storing the objectsclustersMap
- the map containing the clustersprivate Pair<long[],ArrayModifiableDBIDs> findParent(Relation<V> relation, Pair<long[],ArrayModifiableDBIDs> child, TCustomHashMap<long[],List<ArrayModifiableDBIDs>> clustersMap)
relation
- the relation storing the objectschild
- the child to search the parent forclustersMap
- the map containing the clustersprivate void buildHierarchy(Relation<V> database, Clustering<SubspaceModel> clustering, List<Cluster<SubspaceModel>> clusters, int dimensionality)
clustering
- Clustering we processclusters
- the sorted list of clustersdimensionality
- the dimensionality of the datadatabase
- the database containing the data objectsprivate boolean isParent(Relation<V> relation, Cluster<SubspaceModel> parent, Hierarchy.Iter<Cluster<SubspaceModel>> iter, int db_dim)
relation
- the database containing the objectsparent
- the parent to be testediter
- the list of children to be testeddb_dim
- Database dimensionalityprivate int subspaceDimensionality(NumberVector v1, NumberVector v2, long[] pv1, long[] pv2, long[] commonPreferenceVector)
v1
- First vectorv2
- Second vectorpv1
- First preferencepv2
- Second preferencecommonPreferenceVector
- Common preferenceprotected static double weightedDistance(NumberVector v1, NumberVector v2, long[] weightVector)
v1
- the first vectorv2
- the second vectorweightVector
- the preference vectorpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel>>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.