V
- the type of NumberVector handled by this Algorithm@Title(value="DiSH: Detecting Subspace cluster Hierarchies") @Description(value="Algorithm to find hierarchical correlation clusters in subspaces.") @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, I. M\u00fcller-Gorman, A. Zimek", title="Detection and Visualization of Subspace Cluster Hierarchies", booktitle="Proc. 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand, 2007", url="http://dx.doi.org/10.1007/978-3-540-71703-4_15") public class DiSH<V extends NumberVector<V,?>> extends AbstractAlgorithm<Clustering<SubspaceModel<V>>> implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>>
Algorithm for detecting subspace hierarchies.
Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek:
Detection and Visualization of Subspace Cluster Hierarchies.
In Proc. 12th International Conference on Database Systems for Advanced
Applications (DASFAA), Bangkok, Thailand, 2007.
Modifier and Type | Class and Description |
---|---|
static class |
DiSH.Parameterizer<V extends NumberVector<V,?>>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private DiSHDistanceFunction |
dishDistance
The distance function we use
|
private double |
epsilon
Holds the value of
EPSILON_ID . |
static OptionID |
EPSILON_ID
Parameter that specifies the maximum radius of the neighborhood to be
considered in each dimension for determination of the preference vector,
must be a double equal to or greater than 0.
|
private static Logging |
logger
The logger for this class.
|
static OptionID |
MU_ID
Parameter that specifies the a minimum number of points as a smoothing
factor to avoid the single-link-effect, must be an integer greater than 0.
|
private Collection<Pair<OptionID,Object>> |
opticsAlgorithmParameters
Parameters that were given to OPTICS
|
Constructor and Description |
---|
DiSH(double epsilon,
DiSHDistanceFunction dishDistance,
Collection<Pair<OptionID,Object>> opticsAlgorithmParameters)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private void |
buildHierarchy(Relation<V> database,
DiSHDistanceFunction.Instance<V> distFunc,
List<Cluster<SubspaceModel<V>>> clusters,
int dimensionality)
Builds the cluster hierarchy.
|
private void |
checkClusters(Relation<V> database,
DiSHDistanceFunction.Instance<V> distFunc,
Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap,
int minpts)
Removes the clusters with size < minpts from the cluster map and adds them
to their parents.
|
private Clustering<SubspaceModel<V>> |
computeClusters(Relation<V> database,
ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder,
DiSHDistanceFunction.Instance<V> distFunc)
Computes the hierarchical clusters according to the cluster order.
|
private Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> |
extractClusters(Relation<V> database,
DiSHDistanceFunction.Instance<V> distFunc,
ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
Extracts the clusters from the cluster order.
|
private Pair<BitSet,ArrayModifiableDBIDs> |
findParent(Relation<V> database,
DiSHDistanceFunction.Instance<V> distFunc,
Pair<BitSet,ArrayModifiableDBIDs> child,
Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
Returns the parent of the specified cluster
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private boolean |
isParent(Relation<V> database,
DiSHDistanceFunction.Instance<V> distFunc,
Cluster<SubspaceModel<V>> parent,
List<Cluster<SubspaceModel<V>>> children)
Returns true, if the specified parent cluster is a parent of one child of
the children clusters.
|
Clustering<SubspaceModel<V>> |
run(Database database,
Relation<V> relation)
Performs the DiSH algorithm on the given database.
|
private List<Cluster<SubspaceModel<V>>> |
sortClusters(Relation<V> database,
Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
Returns a sorted list of the clusters w.r.t. the subspace dimensionality in
descending order.
|
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging logger
public static final OptionID EPSILON_ID
Default value: 0.001
Key: -dish.epsilon
public static final OptionID MU_ID
Default value: 1
Key: -dish.mu
private double epsilon
EPSILON_ID
.private DiSHDistanceFunction dishDistance
private Collection<Pair<OptionID,Object>> opticsAlgorithmParameters
public DiSH(double epsilon, DiSHDistanceFunction dishDistance, Collection<Pair<OptionID,Object>> opticsAlgorithmParameters)
epsilon
- Epsilon valuedishDistance
- Distance functionopticsAlgorithmParameters
- OPTICS parameterspublic Clustering<SubspaceModel<V>> run(Database database, Relation<V> relation) throws IllegalStateException
IllegalStateException
private Clustering<SubspaceModel<V>> computeClusters(Relation<V> database, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder, DiSHDistanceFunction.Instance<V> distFunc)
database
- the database holding the objectsclusterOrder
- the cluster orderdistFunc
- Distance functionprivate Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> extractClusters(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, ClusterOrderResult<PreferenceVectorBasedCorrelationDistance> clusterOrder)
database
- the database storing the objectsdistFunc
- the distance functionclusterOrder
- the cluster order to extract the clusters fromprivate List<Cluster<SubspaceModel<V>>> sortClusters(Relation<V> database, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
database
- the database storing the objectsclustersMap
- the mapping of bits sets to clustersprivate void checkClusters(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap, int minpts)
database
- the database storing the objectsdistFunc
- the distance functionclustersMap
- the map containing the clustersminpts
- MinPtsprivate Pair<BitSet,ArrayModifiableDBIDs> findParent(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Pair<BitSet,ArrayModifiableDBIDs> child, Map<BitSet,List<Pair<BitSet,ArrayModifiableDBIDs>>> clustersMap)
database
- the database storing the objectsdistFunc
- the distance functionchild
- the child to search the parent forclustersMap
- the map containing the clustersprivate void buildHierarchy(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, List<Cluster<SubspaceModel<V>>> clusters, int dimensionality)
distFunc
- the distance functionclusters
- the sorted list of clustersdimensionality
- the dimensionality of the datadatabase
- the database containing the data objectsprivate boolean isParent(Relation<V> database, DiSHDistanceFunction.Instance<V> distFunc, Cluster<SubspaceModel<V>> parent, List<Cluster<SubspaceModel<V>>> children)
database
- the database containing the objectsdistFunc
- the distance function for distance computation between the
clustersparent
- the parent to be testedchildren
- the list of children to be testedpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>