V
- the type of NumberVector handled by this Algorithm.@Title(value="DOC: Density-based Optimal projective Clustering") @Reference(authors="C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali", title="A Monte Carlo algorithm for fast projective clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'02)", url="https://doi.org/10.1145/564691.564739", bibkey="DBLP:conf/sigmod/ProcopiucJAM02") public class DOC<V extends NumberVector> extends AbstractAlgorithm<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
Reference:
C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali
A Monte Carlo algorithm for fast projective clustering
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '02).
Modifier and Type | Class and Description |
---|---|
static class |
DOC.Parameterizer<V extends NumberVector>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
protected double |
alpha
Relative density threshold parameter alpha.
|
protected double |
beta
Balancing parameter for importance of points vs. dimensions
|
private static Logging |
LOG
The logger for this class.
|
protected RandomFactory |
rnd
Randomizer used internally for sampling points.
|
protected double |
w
Half width parameter.
|
ALGORITHM_ID
Constructor and Description |
---|
DOC(double alpha,
double beta,
double w,
RandomFactory random)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected double |
computeClusterQuality(int clusterSize,
int numRelevantDimensions)
Computes the quality of a cluster based on its size and number of relevant
attributes, as described via the μ-function from the paper.
|
protected boolean |
dimensionIsRelevant(int dimension,
Relation<V> relation,
DBIDs points)
Utility method to test if a given dimension is relevant as determined via a
set of reference points (i.e. if the variance along the attribute is lower
than the threshold).
|
protected DBIDs |
findNeighbors(DBIDRef q,
long[] nD,
ArrayModifiableDBIDs S,
Relation<V> relation)
Find the neighbors of point q in the given subspace
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
protected Cluster<SubspaceModel> |
makeCluster(Relation<V> relation,
DBIDs C,
long[] D)
Utility method to create a subspace cluster from a list of DBIDs and the
relevant attributes.
|
Clustering<SubspaceModel> |
run(Database database,
Relation<V> relation)
Performs the DOC or FastDOC (as configured) algorithm on the given
Database.
|
protected Cluster<SubspaceModel> |
runDOC(Database database,
Relation<V> relation,
ArrayModifiableDBIDs S,
int d,
int n,
int m,
int r,
int minClusterSize)
Performs a single run of DOC, finding a single cluster.
|
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
protected double alpha
protected double beta
protected double w
protected RandomFactory rnd
public DOC(double alpha, double beta, double w, RandomFactory random)
alpha
- α relative density threshold.beta
- β balancing parameter for size vs. dimensionality.w
- half width parameter.random
- Random factorypublic Clustering<SubspaceModel> run(Database database, Relation<V> relation)
database
- Databaserelation
- Data relationprotected Cluster<SubspaceModel> runDOC(Database database, Relation<V> relation, ArrayModifiableDBIDs S, int d, int n, int m, int r, int minClusterSize)
database
- Database contextrelation
- used to get actual values for DBIDs.S
- The set of points we're working on.d
- Dimensionality of the data set we're currently working on.r
- Size of random samples.m
- Number of inner iterations (per seed point).n
- Number of outer iterations (seed points).minClusterSize
- Minimum size a cluster must have to be accepted.null
.protected DBIDs findNeighbors(DBIDRef q, long[] nD, ArrayModifiableDBIDs S, Relation<V> relation)
q
- Query pointnD
- Subspace maskS
- Remaining data pointsrelation
- Data relationprotected boolean dimensionIsRelevant(int dimension, Relation<V> relation, DBIDs points)
dimension
- the dimension to test.relation
- used to get actual values for DBIDs.points
- the points to test.true
if the dimension is relevant.protected Cluster<SubspaceModel> makeCluster(Relation<V> relation, DBIDs C, long[] D)
relation
- to compute a centroid.C
- the cluster points.D
- the relevant dimensions.protected double computeClusterQuality(int clusterSize, int numRelevantDimensions)
clusterSize
- the size of the cluster.numRelevantDimensions
- the number of dimensions relevant to the
cluster.public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel>>
Copyright © 2019 ELKI Development Team. License information.