V
- the type of NumberVector handled by this algorithm@Title(value="SUBCLU: Density connected Subspace Clustering") @Description(value="Algorithm to detect arbitrarily shaped and positioned clusters in subspaces. SUBCLU delivers for each subspace the same clusters DBSCAN would have found, when applied to this subspace seperately.") @Reference(authors="Karin Kailing, Hans-Peter Kriegel, Peer Kr\u00f6ger", title="Density Connected Subspace Clustering for High Dimensional Data", booktitle="Proc. SIAM Int. Conf. on Data Mining (SDM\'04)", url="https://doi.org/10.1137/1.9781611972740.23", bibkey="DBLP:conf/sdm/KroegerKK04") public class SUBCLU<V extends NumberVector> extends AbstractAlgorithm<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
Note: this implementation does not yet implemented the suggested indexing based on inverted files, and that we also only query subsets of them, this makes the implementation as a reusable index challenging.
The original paper is not very clear on which clusters to return, as any subspace cluster must be part of a lower-dimensional projected cluster, so these results would be highly redundant. In this implementation, we only include points in clusters that are not already part of sub-clusters (note that this does not remove overlap of independent subspaces).
Reference:
Karin Kailing, Hans-Peter Kriegel, Peer Kröger
Density Connected Subspace Clustering for High Dimensional Data
Proc. SIAM Int. Conf. on Data Mining (SDM'04)
Modifier and Type | Class and Description |
---|---|
static class |
SUBCLU.Parameterizer<V extends NumberVector>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
protected DimensionSelectingSubspaceDistanceFunction<V> |
distanceFunction
The distance function to determine the distance between objects.
|
protected double |
epsilon
Maximum radius of the neighborhood to be considered.
|
private static Logging |
LOG
The logger for this class.
|
protected int |
mindim
Minimum dimensionality.
|
protected int |
minpts
Minimum number of points.
|
ALGORITHM_ID
Constructor and Description |
---|
SUBCLU(DimensionSelectingSubspaceDistanceFunction<V> distanceFunction,
double epsilon,
int minpts,
int mindim)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private Subspace |
bestSubspace(java.util.List<Subspace> subspaces,
Subspace candidate,
java.util.TreeMap<Subspace,java.util.List<Cluster<Model>>> clusterMap)
Determines the
d -dimensional subspace of the (d+1)
-dimensional candidate with minimal number of objects in the cluster. |
private boolean |
checkLower(Subspace candidate,
java.util.List<Subspace> subspaces)
Perform Apriori-style pruning.
|
private java.util.List<Subspace> |
generateSubspaceCandidates(java.util.List<Subspace> subspaces)
Generates
d+1 -dimensional subspace candidates from the specified
d -dimensional subspaces. |
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
Clustering<SubspaceModel> |
run(Relation<V> relation)
Performs the SUBCLU algorithm on the given database.
|
private java.util.List<Cluster<Model>> |
runDBSCAN(Relation<V> relation,
DBIDs ids,
Subspace subspace)
Runs the DBSCAN algorithm on the specified partition of the database in the
given subspace.
|
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
protected DimensionSelectingSubspaceDistanceFunction<V extends NumberVector> distanceFunction
protected double epsilon
protected int minpts
protected int mindim
public SUBCLU(DimensionSelectingSubspaceDistanceFunction<V> distanceFunction, double epsilon, int minpts, int mindim)
distanceFunction
- Distance functionepsilon
- Epsilon valueminpts
- Minpts valuemindim
- Minimum dimensionalitypublic Clustering<SubspaceModel> run(Relation<V> relation)
relation
- Relation to processprivate java.util.List<Cluster<Model>> runDBSCAN(Relation<V> relation, DBIDs ids, Subspace subspace)
ids
is null DBSCAN will be applied to
the whole database.relation
- the database holding the objects to run DBSCAN onids
- the IDs of the database defining the partition to run DBSCAN on
- if this parameter is null DBSCAN will be applied to the whole
databasesubspace
- the subspace to run DBSCAN onprivate java.util.List<Subspace> generateSubspaceCandidates(java.util.List<Subspace> subspaces)
d+1
-dimensional subspace candidates from the specified
d
-dimensional subspaces.subspaces
- the d
-dimensional subspacesd+1
-dimensional subspace candidatesprivate boolean checkLower(Subspace candidate, java.util.List<Subspace> subspaces)
candidate
- Current candidatesubspaces
- Subspacestrue
if all lower-dimensional subspaces existprivate Subspace bestSubspace(java.util.List<Subspace> subspaces, Subspace candidate, java.util.TreeMap<Subspace,java.util.List<Cluster<Model>>> clusterMap)
d
-dimensional subspace of the (d+1)
-dimensional candidate with minimal number of objects in the cluster.subspaces
- the list of d
-dimensional subspaces containing
clusterscandidate
- the (d+1)
-dimensional candidate subspaceclusterMap
- the mapping of subspaces to clustersd
-dimensional subspace of the (d+1)
-dimensional candidate with minimal number of objects in the
clusterpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel>>
Copyright © 2019 ELKI Development Team. License information.