de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class DiSH<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<Clustering<SubspaceModel<V>>>
      extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DiSH<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel<V>>>, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="DiSH: Detecting Subspace cluster Hierarchies")
@Description(value="Algorithm to find hierarchical correlation clusters in subspaces.")
@Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek",
           title="Detection and Visualization of Subspace Cluster Hierarchies",
           booktitle="Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007",
           url="http://dx.doi.org/10.1007/978-3-540-71703-4_15")
public class DiSH<V extends NumberVector<V,?>>
extends AbstractAlgorithm<Clustering<SubspaceModel<V>>>
implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>>

Algorithm for detecting subspace hierarchies.

Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek: Detection and Visualization of Subspace Cluster Hierarchies.
In Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007.


Nested Class Summary
static class DiSH.Parameterizer<V extends NumberVector<V,?>>
          Parameterization class.
 
Field Summary
private  DiSHDistanceFunction dishDistance
          The distance function we use
private  double epsilon
          Holds the value of EPSILON_ID.
static OptionID EPSILON_ID
          Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.
private static Logging logger
          The logger for this class.
static OptionID MU_ID
          Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.
private  Collection<Pair<OptionID,Object>> opticsAlgorithmParameters
          Parameters that were given to OPTICS
 
Constructor Summary
DiSH(double epsilon, DiSHDistanceFunction dishDistance, Collection<Pair<OptionID,Object>> opticsAlgorithmParameters)
          Constructor.
 
Method Summary
private  void buildHierarchy(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, List<Cluster<SubspaceModel<V>>> clusters, int dimensionality)
          Builds the cluster hierarchy.
private  void checkClusters(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap, int minpts)
          Removes the clusters with size < minpts from the cluster map and adds them to their parents.
private  Clustering<SubspaceModel<V>> computeClusters(Relation<V> database, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder, DiSHDistanceFunction.Instance<V> distFunc)
          Computes the hierarchical clusters according to the cluster order.
private  Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> extractClusters(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
          Extracts the clusters from the cluster order.
private  Pair<BitSet,ArrayModifiableDBIDs> findParent(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Pair<BitSet,ArrayModifiableDBIDs> child, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
          Returns the parent of the specified cluster
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
private  boolean isParent(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Cluster<SubspaceModel<V>> parent, List<Cluster<SubspaceModel<V>>> children)
          Returns true, if the specified parent cluster is a parent of one child of the children clusters.
 Clustering<SubspaceModel<V>> run(Database database, Relation<V> relation)
          Performs the DiSH algorithm on the given database.
private  List<Cluster<SubspaceModel<V>>> sortClusters(Relation<V> database, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
          Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 

Field Detail

logger

private static final Logging logger
The logger for this class.


EPSILON_ID

public static final OptionID EPSILON_ID
Parameter that specifies the maximum radius of the neighborhood to be considered in each dimension for determination of the preference vector, must be a double equal to or greater than 0.

Default value: 0.001

Key: -dish.epsilon


MU_ID

public static final OptionID MU_ID
Parameter that specifies the a minimum number of points as a smoothing factor to avoid the single-link-effect, must be an integer greater than 0.

Default value: 1

Key: -dish.mu


epsilon

private double epsilon
Holds the value of EPSILON_ID.


dishDistance

private DiSHDistanceFunction dishDistance
The distance function we use


opticsAlgorithmParameters

private Collection<Pair<OptionID,Object>> opticsAlgorithmParameters
Parameters that were given to OPTICS

Constructor Detail

DiSH

public DiSH(double epsilon,
            DiSHDistanceFunction dishDistance,
            Collection<Pair<OptionID,Object>> opticsAlgorithmParameters)
Constructor.

Parameters:
epsilon - Epsilon value
dishDistance - Distance function
opticsAlgorithmParameters - OPTICS parameters
Method Detail

run

public Clustering<SubspaceModel<V>> run(Database database,
                                        Relation<V> relation)
                                                           throws IllegalStateException
Performs the DiSH algorithm on the given database.

Throws:
IllegalStateException

computeClusters

private Clustering<SubspaceModel<V>> computeClusters(Relation<V> database,
                                                     ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder,
                                                     DiSHDistanceFunction.Instance<V> distFunc)
Computes the hierarchical clusters according to the cluster order.

Parameters:
database - the database holding the objects
clusterOrder - the cluster order
distFunc - Distance function

extractClusters

private Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> extractClusters(Relation<V> database,
                                                                            DiSHDistanceFunction.Instance<V> distFunc,
                                                                            ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
Extracts the clusters from the cluster order.

Parameters:
database - the database storing the objects
distFunc - the distance function
clusterOrder - the cluster order to extract the clusters from
Returns:
the extracted clusters

sortClusters

private List<Cluster<SubspaceModel<V>>> sortClusters(Relation<V> database,
                                                     Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in descending order.

Parameters:
database - the database storing the objects
clustersMap - the mapping of bits sets to clusters
Returns:
a sorted list of the clusters

checkClusters

private void checkClusters(Relation<V> database,
                           DiSHDistanceFunction.Instance<V> distFunc,
                           Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap,
                           int minpts)
Removes the clusters with size < minpts from the cluster map and adds them to their parents.

Parameters:
database - the database storing the objects
distFunc - the distance function
clustersMap - the map containing the clusters
minpts - MinPts

findParent

private Pair<BitSet,ArrayModifiableDBIDs> findParent(Relation<V> database,
                                                     DiSHDistanceFunction.Instance<V> distFunc,
                                                     Pair<BitSet,ArrayModifiableDBIDs> child,
                                                     Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
Returns the parent of the specified cluster

Parameters:
database - the database storing the objects
distFunc - the distance function
child - the child to search the parent for
clustersMap - the map containing the clusters
Returns:
the parent of the specified cluster

buildHierarchy

private void buildHierarchy(Relation<V> database,
                            DiSHDistanceFunction.Instance<V> distFunc,
                            List<Cluster<SubspaceModel<V>>> clusters,
                            int dimensionality)
Builds the cluster hierarchy.

Parameters:
distFunc - the distance function
clusters - the sorted list of clusters
dimensionality - the dimensionality of the data
database - the database containing the data objects

isParent

private boolean isParent(Relation<V> database,
                         DiSHDistanceFunction.Instance<V> distFunc,
                         Cluster<SubspaceModel<V>> parent,
                         List<Cluster<SubspaceModel<V>>> children)
Returns true, if the specified parent cluster is a parent of one child of the children clusters.

Parameters:
database - the database containing the objects
distFunc - the distance function for distance computation between the clusters
parent - the parent to be tested
children - the list of children to be tested
Returns:
true, if the specified parent cluster is a parent of one child of the children clusters, false otherwise

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)