de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class CLIQUE<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<Clustering<SubspaceModel<V>>>
      extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.CLIQUE<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm, ClusteringAlgorithm<Clustering<SubspaceModel<V>>>, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="CLIQUE: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications")
@Description(value="Grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.")
@Reference(authors="R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan",
           title="Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications",
           booktitle="Proc. SIGMOD Conference, Seattle, WA, 1998",
           url="http://dx.doi.org/10.1145/276304.276314")
public class CLIQUE<V extends NumberVector<V,?>>
extends AbstractAlgorithm<Clustering<SubspaceModel<V>>>
implements ClusteringAlgorithm<Clustering<SubspaceModel<V>>>

Implementation of the CLIQUE algorithm, a grid-based algorithm to identify dense clusters in subspaces of maximum dimensionality.

The implementation consists of two steps:
1. Identification of subspaces that contain clusters
2. Identification of clusters

The third step of the original algorithm (Generation of minimal description for the clusters) is not (yet) implemented.

Reference:
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 1998.


Nested Class Summary
static class CLIQUE.Parameterizer<V extends NumberVector<V,?>>
          Parameterization class.
 
Field Summary
private static Logging logger
          The logger for this class.
private  boolean prune
          Holds the value of PRUNE_ID.
static OptionID PRUNE_ID
          Flag to indicate that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.
private  double tau
          Holds the value of TAU_ID.
static OptionID TAU_ID
          Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.
private  int xsi
          Holds the value of XSI_ID.
static OptionID XSI_ID
          Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.
 
Constructor Summary
CLIQUE(int xsi, double tau, boolean prune)
          Constructor.
 
Method Summary
private  double[][] computeDiffs(List<CLIQUESubspace<V>> denseSubspaces, int[] mi, int[] mp)
          The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P.
private  int[][] computeMeans(List<CLIQUESubspace<V>> denseSubspaces)
          The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P.
private  List<Pair<Subspace<V>,ModifiableDBIDs>> determineClusters(List<CLIQUESubspace<V>> denseSubspaces)
          Determines the clusters in the specified dense subspaces.
private  List<CLIQUESubspace<V>> findDenseSubspaceCandidates(Relation<V> database, List<CLIQUESubspace<V>> denseSubspaces)
          Determines the k-dimensional dense subspace candidates from the specified (k-1)-dimensional dense subspaces.
private  List<CLIQUESubspace<V>> findDenseSubspaces(Relation<V> database, List<CLIQUESubspace<V>> denseSubspaces)
          Determines the k-dimensional dense subspaces and performs a pruning if this option is chosen.
private  List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Relation<V> database)
          Determines the one-dimensional dense subspace candidates by making a pass over the database.
private  List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Relation<V> database)
          Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
private  Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Relation<V> database)
          Initializes and returns the one dimensional units.
private  List<CLIQUESubspace<V>> pruneDenseSubspaces(List<CLIQUESubspace<V>> denseSubspaces)
          Performs a MDL-based pruning of the specified dense subspaces as described in the CLIQUE algorithm.
 Clustering<SubspaceModel<V>> run(Relation<V> relation)
          Performs the CLIQUE algorithm on the given database.
private  void updateMinMax(V featureVector, double[] minima, double[] maxima)
          Updates the minima and maxima array according to the specified feature vector.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 

Field Detail

logger

private static final Logging logger
The logger for this class.


XSI_ID

public static final OptionID XSI_ID
Parameter to specify the number of intervals (units) in each dimension, must be an integer greater than 0.

Key: -clique.xsi


TAU_ID

public static final OptionID TAU_ID
Parameter to specify the density threshold for the selectivity of a unit, where the selectivity is the fraction of total feature vectors contained in this unit, must be a double greater than 0 and less than 1.

Key: -clique.tau


PRUNE_ID

public static final OptionID PRUNE_ID
Flag to indicate that only subspaces with large coverage (i.e. the fraction of the database that is covered by the dense units) are selected, the rest will be pruned.

Key: -clique.prune


xsi

private int xsi
Holds the value of XSI_ID.


tau

private double tau
Holds the value of TAU_ID.


prune

private boolean prune
Holds the value of PRUNE_ID.

Constructor Detail

CLIQUE

public CLIQUE(int xsi,
              double tau,
              boolean prune)
Constructor.

Parameters:
xsi - Xsi value
tau - Tau value
prune - Prune flag
Method Detail

run

public Clustering<SubspaceModel<V>> run(Relation<V> relation)
Performs the CLIQUE algorithm on the given database.

Parameters:
relation - Data relation to process
Returns:
Clustering result

determineClusters

private List<Pair<Subspace<V>,ModifiableDBIDs>> determineClusters(List<CLIQUESubspace<V>> denseSubspaces)
Determines the clusters in the specified dense subspaces.

Parameters:
denseSubspaces - the dense subspaces in reverse order by their coverage
Returns:
the clusters in the specified dense subspaces and the corresponding cluster models

findOneDimensionalDenseSubspaces

private List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaces(Relation<V> database)
Determines the one dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
Returns:
the one dimensional dense subspaces reverse ordered by their coverage

findDenseSubspaces

private List<CLIQUESubspace<V>> findDenseSubspaces(Relation<V> database,
                                                   List<CLIQUESubspace<V>> denseSubspaces)
Determines the k-dimensional dense subspaces and performs a pruning if this option is chosen.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspaces
Returns:
a list of the k-dimensional dense subspaces sorted in reverse order by their coverage

initOneDimensionalUnits

private Collection<CLIQUEUnit<V>> initOneDimensionalUnits(Relation<V> database)
Initializes and returns the one dimensional units.

Parameters:
database - the database to run the algorithm on
Returns:
the created one dimensional units

updateMinMax

private void updateMinMax(V featureVector,
                          double[] minima,
                          double[] maxima)
Updates the minima and maxima array according to the specified feature vector.

Parameters:
featureVector - the feature vector
minima - the array of minima
maxima - the array of maxima

findOneDimensionalDenseSubspaceCandidates

private List<CLIQUESubspace<V>> findOneDimensionalDenseSubspaceCandidates(Relation<V> database)
Determines the one-dimensional dense subspace candidates by making a pass over the database.

Parameters:
database - the database to run the algorithm on
Returns:
the one-dimensional dense subspace candidates reverse ordered by their coverage

findDenseSubspaceCandidates

private List<CLIQUESubspace<V>> findDenseSubspaceCandidates(Relation<V> database,
                                                            List<CLIQUESubspace<V>> denseSubspaces)
Determines the k-dimensional dense subspace candidates from the specified (k-1)-dimensional dense subspaces.

Parameters:
database - the database to run the algorithm on
denseSubspaces - the (k-1)-dimensional dense subspaces
Returns:
a list of the k-dimensional dense subspace candidates reverse ordered by their coverage

pruneDenseSubspaces

private List<CLIQUESubspace<V>> pruneDenseSubspaces(List<CLIQUESubspace<V>> denseSubspaces)
Performs a MDL-based pruning of the specified dense subspaces as described in the CLIQUE algorithm.

Parameters:
denseSubspaces - the subspaces to be pruned sorted in reverse order by their coverage
Returns:
the subspaces which are not pruned reverse ordered by their coverage

computeMeans

private int[][] computeMeans(List<CLIQUESubspace<V>> denseSubspaces)
The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P. For each set the mean of the cover fractions is computed.

Parameters:
denseSubspaces - the dense subspaces in reverse order by their coverage
Returns:
the mean of the cover fractions, the first value is the mean of the selected set I, the second value is the mean of the pruned set P.

computeDiffs

private double[][] computeDiffs(List<CLIQUESubspace<V>> denseSubspaces,
                                int[] mi,
                                int[] mp)
The specified sorted list of dense subspaces is divided into the selected set I and the pruned set P. For each set the difference from the specified mean values is computed.

Parameters:
denseSubspaces - denseSubspaces the dense subspaces in reverse order by their coverage
mi - the mean of the selected sets I
mp - the mean of the pruned sets P
Returns:
the difference from the specified mean values, the first value is the difference from the mean of the selected set I, the second value is the difference from the mean of the pruned set P.

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<V,?>>>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)