de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class SUBCLU<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<Clustering<SubspaceModel<V>>>
      extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.SUBCLU<V>
Type Parameters:
V - the type of FeatureVector handled by this Algorithm
All Implemented Interfaces:
Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel<V>>>, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="SUBCLU: Density connected Subspace Clustering")
@Description(value="Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.")
@Reference(authors="K. Kailing, H.-P. Kriegel, P. Kr\u00f6ger",
           title="Density connected Subspace Clustering for High Dimensional Data. ",
           booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04), Lake Buena Vista, FL, 2004")
public class SUBCLU<V extends NumberVector<V,?>>
extends AbstractAlgorithm<Clustering<SubspaceModel<V>>>
implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>>

Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.

Reference:
K. Kailing, H.-P. Kriegel, P. Kroeger: Density connected Subspace Clustering for High Dimensional Data.
In Proc. SIAM Int. Conf. on Data Mining (SDM'04), Lake Buena Vista, FL, 2004.


Nested Class Summary
static class SUBCLU.Parameterizer<V extends NumberVector<V,?>>
          Parameterization class.
 
Field Summary
static OptionID DISTANCE_FUNCTION_ID
          The distance function to determine the distance between database objects.
private  AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction
          Holds the instance of the distance function specified by DISTANCE_FUNCTION_ID.
private  DoubleDistance epsilon
          Holds the value of EPSILON_ID.
static OptionID EPSILON_ID
          Parameter to specify the maximum radius of the neighborhood to be considered, must be suitable to AbstractDimensionsSelectingDoubleDistanceFunction.
private static Logging logger
          The logger for this class.
private  int minpts
          Holds the value of MINPTS_ID.
static OptionID MINPTS_ID
          Parameter to specify the threshold for minimum number of points in the epsilon-neighborhood of a point, must be an integer greater than 0.
private  Clustering<SubspaceModel<V>> result
          Holds the result;
 
Constructor Summary
SUBCLU(AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction, DoubleDistance epsilon, int minpts)
          Constructor.
 
Method Summary
private  Subspace<V> bestSubspace(List<Subspace<V>> subspaces, Subspace<V> candidate, TreeMap<Subspace<V>,List<Cluster<Model>>> clusterMap)
          Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.
private  List<Subspace<V>> generateSubspaceCandidates(List<Subspace<V>> subspaces)
          Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
 Clustering<SubspaceModel<V>> getResult()
          Returns the result of the algorithm.
private  List<Subspace<V>> lowerSubspaces(Subspace<V> subspace)
          Returns the list of all (d-1)-dimensional subspaces of the specified d-dimensional subspace.
 Clustering<SubspaceModel<V>> run(Relation<V> relation)
          Performs the SUBCLU algorithm on the given database.
private  List<Cluster<Model>> runDBSCAN(Relation<V> relation, DBIDs ids, Subspace<V> subspace)
          Runs the DBSCAN algorithm on the specified partition of the database in the given subspace.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 

Field Detail

logger

private static final Logging logger
The logger for this class.


DISTANCE_FUNCTION_ID

public static final OptionID DISTANCE_FUNCTION_ID
The distance function to determine the distance between database objects.

Default value: DimensionsSelectingEuclideanDistanceFunction

Key: -subclu.distancefunction


EPSILON_ID

public static final OptionID EPSILON_ID
Parameter to specify the maximum radius of the neighborhood to be considered, must be suitable to AbstractDimensionsSelectingDoubleDistanceFunction.

Key: -subclu.epsilon


MINPTS_ID

public static final OptionID MINPTS_ID
Parameter to specify the threshold for minimum number of points in the epsilon-neighborhood of a point, must be an integer greater than 0.

Key: -subclu.minpts


distanceFunction

private AbstractDimensionsSelectingDoubleDistanceFunction<V extends NumberVector<V,?>> distanceFunction
Holds the instance of the distance function specified by DISTANCE_FUNCTION_ID.


epsilon

private DoubleDistance epsilon
Holds the value of EPSILON_ID.


minpts

private int minpts
Holds the value of MINPTS_ID.


result

private Clustering<SubspaceModel<V extends NumberVector<V,?>>> result
Holds the result;

Constructor Detail

SUBCLU

public SUBCLU(AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction,
              DoubleDistance epsilon,
              int minpts)
Constructor.

Parameters:
distanceFunction - Distance function
epsilon - Epsilon value
minpts - Minpts value
Method Detail

run

public Clustering<SubspaceModel<V>> run(Relation<V> relation)
                                                           throws IllegalStateException
Performs the SUBCLU algorithm on the given database.

Parameters:
relation - Relation to process
Returns:
Clustering result
Throws:
IllegalStateException

getResult

public Clustering<SubspaceModel<V>> getResult()
Returns the result of the algorithm.

Returns:
the result of the algorithm

runDBSCAN

private List<Cluster<Model>> runDBSCAN(Relation<V> relation,
                                       DBIDs ids,
                                       Subspace<V> subspace)
Runs the DBSCAN algorithm on the specified partition of the database in the given subspace. If parameter ids is null DBSCAN will be applied to the whole database.

Parameters:
relation - the database holding the objects to run DBSCAN on
ids - the IDs of the database defining the partition to run DBSCAN on - if this parameter is null DBSCAN will be applied to the whole database
subspace - the subspace to run DBSCAN on
Returns:
the clustering result of the DBSCAN run

generateSubspaceCandidates

private List<Subspace<V>> generateSubspaceCandidates(List<Subspace<V>> subspaces)
Generates d+1-dimensional subspace candidates from the specified d-dimensional subspaces.

Parameters:
subspaces - the d-dimensional subspaces
Returns:
the d+1-dimensional subspace candidates

lowerSubspaces

private List<Subspace<V>> lowerSubspaces(Subspace<V> subspace)
Returns the list of all (d-1)-dimensional subspaces of the specified d-dimensional subspace.

Parameters:
subspace - the d-dimensional subspace
Returns:
a list of all (d-1)-dimensional subspaces

bestSubspace

private Subspace<V> bestSubspace(List<Subspace<V>> subspaces,
                                 Subspace<V> candidate,
                                 TreeMap<Subspace<V>,List<Cluster<Model>>> clusterMap)
Determines the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster.

Parameters:
subspaces - the list of d-dimensional subspaces containing clusters
candidate - the (d+1)-dimensional candidate subspace
clusterMap - the mapping of subspaces to clusters
Returns:
the d-dimensional subspace of the (d+1) -dimensional candidate with minimal number of objects in the cluster

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)