DOC (ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures)

java.lang.Object
- de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<Clustering<SubspaceModel>>
- - de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DOC<V>

Type Parameters:: V - the type of NumberVector handled by this Algorithm.

All Implemented Interfaces:: Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel>>, SubspaceClusteringAlgorithm<SubspaceModel>

@Title(value="DOC: Density-based Optimal projective Clustering")
@Reference(authors="C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali",
           title="A Monte Carlo algorithm for fast projective clustering",
           booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'02)",
           url="http://dx.doi.org/10.1145/564691.564739")
public class DOC<V extends NumberVector>
extends AbstractAlgorithm<Clustering<SubspaceModel>>
implements SubspaceClusteringAlgorithm<SubspaceModel>

The DOC algorithm, and it's heuristic variant, FastDOC. DOC is a sampling based subspace clustering algorithm.

Reference:
C. M. Procopiuc, M. Jones, P. K. Agarwal, T. M. Murali
A Monte Carlo algorithm for fast projective clustering.
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '02).

Author:: Florian Nuecke

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class DOC.Parameterizer<V extends NumberVector>
Parameterization class.

Nested Classes
Modifier and Type	Class and Description
`static class`	`DOC.Parameterizer<V extends NumberVector>` Parameterization class.

Field Summary

Fields
Modifier and Type	Field and Description
`private double`	`alpha` Relative density threshold parameter alpha.
`private double`	`beta` Balancing parameter for importance of points vs. dimensions
`private int`	`d_zero` Holds the value of `DOC.Parameterizer.D_ZERO_ID`.
`private boolean`	`heuristics` Holds the value of `DOC.Parameterizer.HEURISTICS_ID`.
`private static Logging`	`LOG` The logger for this class.
`private RandomFactory`	`rnd` Randomizer used internally for sampling points.
`private double`	`w` Half width parameter.

Constructor Summary

Constructors
Constructor and Description

DOC(double alpha, double beta, double w, boolean heuristics, int d_zero, RandomFactory random)
Constructor.

Constructors
Constructor and Description
`DOC(double alpha, double beta, double w, boolean heuristics, int d_zero, RandomFactory random)` Constructor.

Method Summary

Methods
Modifier and Type	Method and Description
`private double`	`computeClusterQuality(int clusterSize, int numRelevantDimensions)` Computes the quality of a cluster based on its size and number of relevant attributes, as described via the μ-function from the paper.
`private boolean`	`dimensionIsRelevant(int dimension, Relation<V> relation, DBIDs points)` Utility method to test if a given dimension is relevant as determined via a set of reference points (i.e. if the variance along the attribute is lower than the threshold).
`TypeInformation[]`	`getInputTypeRestriction()` Get the input type restriction used for negotiating the data query.
`protected Logging`	`getLogger()` Get the (STATIC) logger for this class.
`private Cluster<SubspaceModel>`	`makeCluster(Relation<V> relation, DBIDs C, long[] D)` Utility method to create a subspace cluster from a list of DBIDs and the relevant attributes.
`Clustering<SubspaceModel>`	`run(Database database, Relation<V> relation)` Performs the DOC or FastDOC (as configured) algorithm on the given Database.
`private Cluster<SubspaceModel>`	`runDOC(Database database, Relation<V> relation, ArrayModifiableDBIDs S, int d, int n, int m, int r, int minClusterSize)` Performs a single run of DOC, finding a single cluster.
`private Cluster<SubspaceModel>`	`runFastDOC(Database database, Relation<V> relation, ArrayModifiableDBIDs S, int d, int n, int m, int r)` Performs a single run of FastDOC, finding a single cluster.

Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run

- Field Detail
  - LOG
```
private static final Logging LOG
```
    The logger for this class.
  - alpha
```
private double alpha
```
    Relative density threshold parameter alpha.
  - beta
```
private double beta
```
    Balancing parameter for importance of points vs. dimensions
  - w
```
private double w
```
    Half width parameter.
  - heuristics
```
private boolean heuristics
```
    Holds the value of DOC.Parameterizer.HEURISTICS_ID.
  - d_zero
```
private int d_zero
```
    Holds the value of DOC.Parameterizer.D_ZERO_ID.
  - rnd
```
private RandomFactory rnd
```
    Randomizer used internally for sampling points.
- Constructor Detail
  - DOC
```
public DOC(double alpha,
   double beta,
   double w,
   boolean heuristics,
   int d_zero,
   RandomFactory random)
```
    Constructor.
    
    Parameters:
    alpha - α relative density threshold.
    beta - β balancing parameter for size vs. dimensionality.
    w - w half width parameter.
    heuristics - whether to use heuristics (FastDOC) or not.
    random - Random factory
- Method Detail
  - run
```
public Clustering<SubspaceModel> run(Database database,
                            Relation<V> relation)
```
    Performs the DOC or FastDOC (as configured) algorithm on the given Database.
    This will run exhaustively, i.e. run DOC until no clusters are found anymore / the database size has shrunk below the threshold for minimum cluster size.
    
    Parameters:
    database - Database
    relation - Data relation
  - runDOC
```
private Cluster<SubspaceModel> runDOC(Database database,
                            Relation<V> relation,
                            ArrayModifiableDBIDs S,
                            int d,
                            int n,
                            int m,
                            int r,
                            int minClusterSize)
```
    Performs a single run of DOC, finding a single cluster.
    
    Parameters:
    database - Database context
    relation - used to get actual values for DBIDs.
    S - The set of points we're working on.
    d - Dimensionality of the data set we're currently working on.
    r - Size of random samples.
    m - Number of inner iterations (per seed point).
    n - Number of outer iterations (seed points).
    minClusterSize - Minimum size a cluster must have to be accepted.
    
    Returns:
    a cluster, if one is found, else null.
  - runFastDOC
```
private Cluster<SubspaceModel> runFastDOC(Database database,
                                Relation<V> relation,
                                ArrayModifiableDBIDs S,
                                int d,
                                int n,
                                int m,
                                int r)
```
    Performs a single run of FastDOC, finding a single cluster.
    
    Parameters:
    database - Database context
    relation - used to get actual values for DBIDs.
    S - The set of points we're working on.
    d - Dimensionality of the data set we're currently working on.
    r - Size of random samples.
    m - Number of inner iterations (per seed point).
    n - Number of outer iterations (seed points).
    
    Returns:
    a cluster, if one is found, else null.
  - dimensionIsRelevant
```
private boolean dimensionIsRelevant(int dimension,
                          Relation<V> relation,
                          DBIDs points)
```
    Utility method to test if a given dimension is relevant as determined via a set of reference points (i.e. if the variance along the attribute is lower than the threshold).
    
    Parameters:
    dimension - the dimension to test.
    relation - used to get actual values for DBIDs.
    points - the points to test.
    
    Returns:
    true if the dimension is relevant.
  - makeCluster
```
private Cluster<SubspaceModel> makeCluster(Relation<V> relation,
                                 DBIDs C,
                                 long[] D)
```
    Utility method to create a subspace cluster from a list of DBIDs and the relevant attributes.
    
    Parameters:
    relation - to compute a centroid.
    C - the cluster points.
    D - the relevant dimensions.
    
    Returns:
    an object representing the subspace cluster.
  - computeClusterQuality
```
private double computeClusterQuality(int clusterSize,
                           int numRelevantDimensions)
```
    Computes the quality of a cluster based on its size and number of relevant attributes, as described via the μ-function from the paper.
    
    Parameters:
    clusterSize - the size of the cluster.
    numRelevantDimensions - the number of dimensions relevant to the cluster.
    
    Returns:
    a quality measure (only use this to compare the quality to that other clusters).
  - getInputTypeRestriction
```
public TypeInformation[] getInputTypeRestriction()
```
    Description copied from class: AbstractAlgorithm
    
    Get the input type restriction used for negotiating the data query.
    
    Specified by:
    
    getInputTypeRestriction in interface Algorithm
    
    Specified by:
    
    getInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel>>
    
    Returns:
    Type restriction
  - getLogger
```
protected Logging getLogger()
```
    Description copied from class: AbstractAlgorithm
    
    Get the (STATIC) logger for this class.
    
    Specified by:
    
    getLogger in class AbstractAlgorithm<Clustering<SubspaceModel>>
    
    Returns:
    the static logger

Class DOC<V extends NumberVector>

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm

Methods inherited from class java.lang.Object

Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm

Field Detail

LOG

alpha

beta

w

heuristics

d_zero

rnd

Constructor Detail

DOC

Method Detail

run

runDOC

runFastDOC

dimensionIsRelevant

makeCluster

computeClusterQuality

getInputTypeRestriction

getLogger