
V - the type of NumberVector handled by this Algorithm@Title(value="DiSH: Detecting Subspace cluster Hierarchies") @Description(value="Algorithm to find hierarchical correlation clusters in subspaces.") @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek", title="Detection and Visualization of Subspace Cluster Hierarchies", booktitle="Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007", url="http://dx.doi.org/10.1007/978-3-540-71703-4_15") public class DiSH<V extends NumberVector<?>> extends AbstractAlgorithm<Clustering<SubspaceModel<V>>> implements SubspaceClusteringAlgorithm<SubspaceModel<V>>
Algorithm for detecting subspace hierarchies.
 Reference: 
 E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek:
 Detection and Visualization of Subspace Cluster Hierarchies. 
 In Proc. 12th International Conference on Database Systems for Advanced
 Applications (DASFAA), Bangkok, Thailand, 2007.
 
| Modifier and Type | Class and Description | 
|---|---|
| static class  | DiSH.Parameterizer<V extends NumberVector<?>>Parameterization class. | 
| Modifier and Type | Field and Description | 
|---|---|
| private DiSHDistanceFunction | dishDistanceThe distance function we use | 
| private double | epsilonHolds the value of  EPSILON_ID. | 
| static OptionID | EPSILON_IDParameter that specifies the maximum radius of the neighborhood to be
 considered in each dimension for determination of the preference vector,
 must be a double equal to or greater than 0. | 
| private static Logging | LOGThe logger for this class. | 
| static OptionID | MU_IDParameter that specifies the a minimum number of points as a smoothing
 factor to avoid the single-link-effect, must be an integer greater than 0. | 
| private Collection<Pair<OptionID,Object>> | opticsAlgorithmParametersParameters that were given to OPTICS | 
| Constructor and Description | 
|---|
| DiSH(double epsilon,
    DiSHDistanceFunction dishDistance,
    Collection<Pair<OptionID,Object>> opticsAlgorithmParameters)Constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| private void | buildHierarchy(Relation<V> database,
              DiSHDistanceFunction.Instance<V> distFunc,
              List<Cluster<SubspaceModel<V>>> clusters,
              int dimensionality)Builds the cluster hierarchy. | 
| private void | checkClusters(Relation<V> database,
             DiSHDistanceFunction.Instance<V> distFunc,
             Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap,
             int minpts)Removes the clusters with size < minpts from the cluster map and adds them
 to their parents. | 
| private Clustering<SubspaceModel<V>> | computeClusters(Relation<V> database,
               ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder,
               DiSHDistanceFunction.Instance<V> distFunc)Computes the hierarchical clusters according to the cluster order. | 
| private Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> | extractClusters(Relation<V> database,
               DiSHDistanceFunction.Instance<V> distFunc,
               ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)Extracts the clusters from the cluster order. | 
| private Pair<BitSet,ArrayModifiableDBIDs> | findParent(Relation<V> database,
          DiSHDistanceFunction.Instance<V> distFunc,
          Pair<BitSet,ArrayModifiableDBIDs> child,
          Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)Returns the parent of the specified cluster | 
| TypeInformation[] | getInputTypeRestriction()Get the input type restriction used for negotiating the data query. | 
| protected Logging | getLogger()Get the (STATIC) logger for this class. | 
| private boolean | isParent(Relation<V> database,
        DiSHDistanceFunction.Instance<V> distFunc,
        Cluster<SubspaceModel<V>> parent,
        List<Cluster<SubspaceModel<V>>> children)Returns true, if the specified parent cluster is a parent of one child of
 the children clusters. | 
| Clustering<SubspaceModel<V>> | run(Database database,
   Relation<V> relation)Performs the DiSH algorithm on the given database. | 
| private List<Cluster<SubspaceModel<V>>> | sortClusters(Relation<V> database,
            Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)Returns a sorted list of the clusters w.r.t. the subspace dimensionality in
 descending order. | 
makeParameterDistanceFunction, runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
public static final OptionID EPSILON_ID
 Default value: 0.001
 
 Key: -dish.epsilon
 
public static final OptionID MU_ID
 Default value: 1
 
 Key: -dish.mu
 
private double epsilon
EPSILON_ID.private DiSHDistanceFunction dishDistance
private Collection<Pair<OptionID,Object>> opticsAlgorithmParameters
public DiSH(double epsilon,
    DiSHDistanceFunction dishDistance,
    Collection<Pair<OptionID,Object>> opticsAlgorithmParameters)
epsilon - Epsilon valuedishDistance - Distance functionopticsAlgorithmParameters - OPTICS parameterspublic Clustering<SubspaceModel<V>> run(Database database, Relation<V> relation)
database - Database to processrelation - Relation to processprivate Clustering<SubspaceModel<V>> computeClusters(Relation<V> database, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder, DiSHDistanceFunction.Instance<V> distFunc)
database - the database holding the objectsclusterOrder - the cluster orderdistFunc - Distance functionprivate Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> extractClusters(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
database - the database storing the objectsdistFunc - the distance functionclusterOrder - the cluster order to extract the clusters fromprivate List<Cluster<SubspaceModel<V>>> sortClusters(Relation<V> database, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
database - the database storing the objectsclustersMap - the mapping of bits sets to clustersprivate void checkClusters(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap, int minpts)
database - the database storing the objectsdistFunc - the distance functionclustersMap - the map containing the clustersminpts - MinPtsprivate Pair<BitSet,ArrayModifiableDBIDs> findParent(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Pair<BitSet,ArrayModifiableDBIDs> child, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
database - the database storing the objectsdistFunc - the distance functionchild - the child to search the parent forclustersMap - the map containing the clustersprivate void buildHierarchy(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, List<Cluster<SubspaceModel<V>>> clusters, int dimensionality)
distFunc - the distance functionclusters - the sorted list of clustersdimensionality - the dimensionality of the datadatabase - the database containing the data objectsprivate boolean isParent(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Cluster<SubspaceModel<V>> parent, List<Cluster<SubspaceModel<V>>> children)
database - the database containing the objectsdistFunc - the distance function for distance computation between the
        clustersparent - the parent to be testedchildren - the list of children to be testedpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>