
V - the type of FeatureVector handled by this Algorithm@Title(value="SUBCLU: Density connected Subspace Clustering") @Description(value="Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.") @Reference(authors="K. Kailing, H.-P. Kriegel, P. Kr\u00f6ger", title="Density connected Subspace Clustering for High Dimensional Data. ", booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04), Lake Buena Vista, FL, 2004") public class SUBCLU<V extends NumberVector<?>> extends AbstractAlgorithm<Clustering<SubspaceModel<V>>> implements SubspaceClusteringAlgorithm<SubspaceModel<V>>
Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.
 Reference: 
 K. Kailing, H.-P. Kriegel, P. Kroeger: Density connected Subspace Clustering
 for High Dimensional Data. 
 In Proc. SIAM Int. Conf. on Data Mining (SDM'04), Lake Buena Vista, FL, 2004.
 
| Modifier and Type | Class and Description | 
|---|---|
| static class  | SUBCLU.Parameterizer<V extends NumberVector<?>>Parameterization class. | 
| Modifier and Type | Field and Description | 
|---|---|
| static OptionID | DISTANCE_FUNCTION_IDThe distance function to determine the distance between database objects. | 
| private AbstractDimensionsSelectingDoubleDistanceFunction<V> | distanceFunctionHolds the instance of the distance function specified by
  DISTANCE_FUNCTION_ID. | 
| private DoubleDistance | epsilonHolds the value of  EPSILON_ID. | 
| static OptionID | EPSILON_IDParameter to specify the maximum radius of the neighborhood to be
 considered, must be suitable to
  AbstractDimensionsSelectingDoubleDistanceFunction. | 
| private static Logging | LOGThe logger for this class. | 
| private int | minptsHolds the value of  MINPTS_ID. | 
| static OptionID | MINPTS_IDParameter to specify the threshold for minimum number of points in the
 epsilon-neighborhood of a point, must be an integer greater than 0. | 
| private Clustering<SubspaceModel<V>> | resultHolds the result; | 
| Constructor and Description | 
|---|
| SUBCLU(AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction,
      DoubleDistance epsilon,
      int minpts)Constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| private Subspace | bestSubspace(List<Subspace> subspaces,
            Subspace candidate,
            TreeMap<Subspace,List<Cluster<Model>>> clusterMap)Determines the  d-dimensional subspace of the(d+1)-dimensional candidate with minimal number of objects in the cluster. | 
| private List<Subspace> | generateSubspaceCandidates(List<Subspace> subspaces)Generates  d+1-dimensional subspace candidates from the specifiedd-dimensional subspaces. | 
| TypeInformation[] | getInputTypeRestriction()Get the input type restriction used for negotiating the data query. | 
| protected Logging | getLogger()Get the (STATIC) logger for this class. | 
| Clustering<SubspaceModel<V>> | getResult()Returns the result of the algorithm. | 
| private List<Subspace> | lowerSubspaces(Subspace subspace)Returns the list of all  (d-1)-dimensional subspaces of the
 specifiedd-dimensional subspace. | 
| Clustering<SubspaceModel<V>> | run(Relation<V> relation)Performs the SUBCLU algorithm on the given database. | 
| private List<Cluster<Model>> | runDBSCAN(Relation<V> relation,
         DBIDs ids,
         Subspace subspace)Runs the DBSCAN algorithm on the specified partition of the database in the
 given subspace. | 
makeParameterDistanceFunction, runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
public static final OptionID DISTANCE_FUNCTION_ID
 Default value: SubspaceEuclideanDistanceFunction
 
 Key: -subclu.distancefunction
 
public static final OptionID EPSILON_ID
AbstractDimensionsSelectingDoubleDistanceFunction.
 
 Key: -subclu.epsilon
 
public static final OptionID MINPTS_ID
 Key: -subclu.minpts
 
private AbstractDimensionsSelectingDoubleDistanceFunction<V extends NumberVector<?>> distanceFunction
DISTANCE_FUNCTION_ID.private DoubleDistance epsilon
EPSILON_ID.private int minpts
MINPTS_ID.private Clustering<SubspaceModel<V extends NumberVector<?>>> result
public SUBCLU(AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction, DoubleDistance epsilon, int minpts)
distanceFunction - Distance functionepsilon - Epsilon valueminpts - Minpts valuepublic Clustering<SubspaceModel<V>> run(Relation<V> relation)
relation - Relation to processpublic Clustering<SubspaceModel<V>> getResult()
private List<Cluster<Model>> runDBSCAN(Relation<V> relation, DBIDs ids, Subspace subspace)
ids is null DBSCAN will be applied to
 the whole database.relation - the database holding the objects to run DBSCAN onids - the IDs of the database defining the partition to run DBSCAN on
        - if this parameter is null DBSCAN will be applied to the whole
        databasesubspace - the subspace to run DBSCAN onprivate List<Subspace> generateSubspaceCandidates(List<Subspace> subspaces)
d+1-dimensional subspace candidates from the specified
 d-dimensional subspaces.subspaces - the d-dimensional subspacesd+1-dimensional subspace candidatesprivate List<Subspace> lowerSubspaces(Subspace subspace)
(d-1)-dimensional subspaces of the
 specified d-dimensional subspace.subspace - the d-dimensional subspace(d-1)-dimensional subspacesprivate Subspace bestSubspace(List<Subspace> subspaces, Subspace candidate, TreeMap<Subspace,List<Cluster<Model>>> clusterMap)
d-dimensional subspace of the (d+1)
 -dimensional candidate with minimal number of objects in the cluster.subspaces - the list of d-dimensional subspaces containing
        clusterscandidate - the (d+1)-dimensional candidate subspaceclusterMap - the mapping of subspaces to clustersd-dimensional subspace of the (d+1)
         -dimensional candidate with minimal number of objects in the
         clusterpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>