V
- the type of FeatureVector handled by this Algorithm@Title(value="SUBCLU: Density connected Subspace Clustering") @Description(value="Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.") @Reference(authors="K. Kailing, H.-P. Kriegel, P. Kr\u00f6ger", title="Density connected Subspace Clustering for High Dimensional Data. ", booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04), Lake Buena Vista, FL, 2004") public class SUBCLU<V extends NumberVector<?>> extends AbstractAlgorithm<Clustering<SubspaceModel<V>>> implements SubspaceClusteringAlgorithm<SubspaceModel<V>>
Implementation of the SUBCLU algorithm, an algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace separately.
Reference:
K. Kailing, H.-P. Kriegel, P. Kroeger: Density connected Subspace Clustering
for High Dimensional Data.
In Proc. SIAM Int. Conf. on Data Mining (SDM'04), Lake Buena Vista, FL, 2004.
Modifier and Type | Class and Description |
---|---|
static class |
SUBCLU.Parameterizer<V extends NumberVector<?>>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
static OptionID |
DISTANCE_FUNCTION_ID
The distance function to determine the distance between database objects.
|
private AbstractDimensionsSelectingDoubleDistanceFunction<V> |
distanceFunction
Holds the instance of the distance function specified by
DISTANCE_FUNCTION_ID . |
private DoubleDistance |
epsilon
Holds the value of
EPSILON_ID . |
static OptionID |
EPSILON_ID
Parameter to specify the maximum radius of the neighborhood to be
considered, must be suitable to
AbstractDimensionsSelectingDoubleDistanceFunction . |
private static Logging |
LOG
The logger for this class.
|
private int |
minpts
Holds the value of
MINPTS_ID . |
static OptionID |
MINPTS_ID
Parameter to specify the threshold for minimum number of points in the
epsilon-neighborhood of a point, must be an integer greater than 0.
|
private Clustering<SubspaceModel<V>> |
result
Holds the result;
|
Constructor and Description |
---|
SUBCLU(AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction,
DoubleDistance epsilon,
int minpts)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private Subspace |
bestSubspace(List<Subspace> subspaces,
Subspace candidate,
TreeMap<Subspace,List<Cluster<Model>>> clusterMap)
Determines the
d -dimensional subspace of the (d+1)
-dimensional candidate with minimal number of objects in the cluster. |
private List<Subspace> |
generateSubspaceCandidates(List<Subspace> subspaces)
Generates
d+1 -dimensional subspace candidates from the specified
d -dimensional subspaces. |
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
Clustering<SubspaceModel<V>> |
getResult()
Returns the result of the algorithm.
|
private List<Subspace> |
lowerSubspaces(Subspace subspace)
Returns the list of all
(d-1) -dimensional subspaces of the
specified d -dimensional subspace. |
Clustering<SubspaceModel<V>> |
run(Relation<V> relation)
Performs the SUBCLU algorithm on the given database.
|
private List<Cluster<Model>> |
runDBSCAN(Relation<V> relation,
DBIDs ids,
Subspace subspace)
Runs the DBSCAN algorithm on the specified partition of the database in the
given subspace.
|
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
public static final OptionID DISTANCE_FUNCTION_ID
Default value: SubspaceEuclideanDistanceFunction
Key: -subclu.distancefunction
public static final OptionID EPSILON_ID
AbstractDimensionsSelectingDoubleDistanceFunction
.
Key: -subclu.epsilon
public static final OptionID MINPTS_ID
Key: -subclu.minpts
private AbstractDimensionsSelectingDoubleDistanceFunction<V extends NumberVector<?>> distanceFunction
DISTANCE_FUNCTION_ID
.private DoubleDistance epsilon
EPSILON_ID
.private int minpts
MINPTS_ID
.private Clustering<SubspaceModel<V extends NumberVector<?>>> result
public SUBCLU(AbstractDimensionsSelectingDoubleDistanceFunction<V> distanceFunction, DoubleDistance epsilon, int minpts)
distanceFunction
- Distance functionepsilon
- Epsilon valueminpts
- Minpts valuepublic Clustering<SubspaceModel<V>> run(Relation<V> relation)
relation
- Relation to processpublic Clustering<SubspaceModel<V>> getResult()
private List<Cluster<Model>> runDBSCAN(Relation<V> relation, DBIDs ids, Subspace subspace)
ids
is null DBSCAN will be applied to
the whole database.relation
- the database holding the objects to run DBSCAN onids
- the IDs of the database defining the partition to run DBSCAN on
- if this parameter is null DBSCAN will be applied to the whole
databasesubspace
- the subspace to run DBSCAN onprivate List<Subspace> generateSubspaceCandidates(List<Subspace> subspaces)
d+1
-dimensional subspace candidates from the specified
d
-dimensional subspaces.subspaces
- the d
-dimensional subspacesd+1
-dimensional subspace candidatesprivate List<Subspace> lowerSubspaces(Subspace subspace)
(d-1)
-dimensional subspaces of the
specified d
-dimensional subspace.subspace
- the d
-dimensional subspace(d-1)
-dimensional subspacesprivate Subspace bestSubspace(List<Subspace> subspaces, Subspace candidate, TreeMap<Subspace,List<Cluster<Model>>> clusterMap)
d
-dimensional subspace of the (d+1)
-dimensional candidate with minimal number of objects in the cluster.subspaces
- the list of d
-dimensional subspaces containing
clusterscandidate
- the (d+1)
-dimensional candidate subspaceclusterMap
- the mapping of subspaces to clustersd
-dimensional subspace of the (d+1)
-dimensional candidate with minimal number of objects in the
clusterpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>