de.lmu.ifi.dbs.elki.algorithm.clustering.correlation
Class ORCLUS<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<R>
      extended by de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering<Clustering<Model>,V>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.ORCLUS<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm, ClusteringAlgorithm<Clustering<Model>>, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="ORCLUS: Arbitrarily ORiented projected CLUSter generation")
@Description(value="Algorithm to find correlation clusters in high dimensional spaces.")
@Reference(authors="C. C. Aggarwal, P. S. Yu",
           title="Finding Generalized Projected Clusters in High Dimensional Spaces",
           booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'00)",
           url="http://dx.doi.org/10.1145/342009.335383")
public class ORCLUS<V extends NumberVector<V,?>>
extends AbstractProjectedClustering<Clustering<Model>,V>

ORCLUS provides the ORCLUS algorithm, an algorithm to find clusters in high dimensional spaces.

Reference: C. C. Aggarwal, P. S. Yu: Finding Generalized Projected Clusters in High Dimensional Spaces.
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '00).


Nested Class Summary
private  class ORCLUS.ORCLUSCluster
          Encapsulates the attributes of a cluster.
static class ORCLUS.Parameterizer<V extends NumberVector<V,?>>
          Parameterization class.
private  class ORCLUS.ProjectedEnergy
          Encapsulates the projected energy for a cluster.
 
Field Summary
private  double alpha
          Holds the value of ALPHA_ID.
static OptionID ALPHA_ID
          Parameter to specify the factor for reducing the number of current clusters in each iteration, must be an integer greater than 0 and less than 1.
private static Logging logger
          The logger for this class.
private  PCARunner<V> pca
          The PCA utility object.
private  Long seed
          Holds the value of SEED_ID.
static OptionID SEED_ID
          Parameter to specify the random generator seed.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering
k, k_i, K_I_ID, K_ID, l, L_ID
 
Constructor Summary
ORCLUS(int k, int k_i, int l, double alpha, long seed, PCARunner<V> pca)
          Java constructor.
 
Method Summary
private  void assign(Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, List<ORCLUS.ORCLUSCluster> clusters)
          Creates a partitioning of the database by assigning each object to its closest seed.
private  Matrix findBasis(Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, ORCLUS.ORCLUSCluster cluster, int dim)
          Finds the basis of the subspace of dimensionality dim for the specified cluster.
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
private  List<ORCLUS.ORCLUSCluster> initialSeeds(Relation<V> database, int k)
          Initializes the list of seeds wit a random sample of size k.
private  void merge(Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, List<ORCLUS.ORCLUSCluster> clusters, int k_new, int d_new, IndefiniteProgress cprogress)
          Reduces the number of seeds to k_new
private  ORCLUS.ProjectedEnergy projectedEnergy(Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, ORCLUS.ORCLUSCluster c_i, ORCLUS.ORCLUSCluster c_j, int i, int j, int dim)
          Computes the projected energy of the specified clusters.
private  V projection(ORCLUS.ORCLUSCluster c, V o, V factory)
          Returns the projection of real vector o in the subspace of cluster c.
 Clustering<Model> run(Database database, Relation<V> relation)
          Performs the ORCLUS algorithm on the given database.
private  ORCLUS.ORCLUSCluster union(Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, ORCLUS.ORCLUSCluster c1, ORCLUS.ORCLUSCluster c2, int dim)
          Returns the union of the two specified clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering
getDistanceFunction, getDistanceQuery
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 

Field Detail

logger

private static final Logging logger
The logger for this class.


ALPHA_ID

public static final OptionID ALPHA_ID
Parameter to specify the factor for reducing the number of current clusters in each iteration, must be an integer greater than 0 and less than 1.

Default value: 0.5

Key: -orclus.alpha


SEED_ID

public static final OptionID SEED_ID
Parameter to specify the random generator seed.


alpha

private double alpha
Holds the value of ALPHA_ID.


seed

private Long seed
Holds the value of SEED_ID.


pca

private PCARunner<V extends NumberVector<V,?>> pca
The PCA utility object.

Constructor Detail

ORCLUS

public ORCLUS(int k,
              int k_i,
              int l,
              double alpha,
              long seed,
              PCARunner<V> pca)
Java constructor.

Parameters:
k - k Parameter
k_i - k_i Parameter
l - l Parameter
alpha - Alpha Parameter
seed - Seed parameter
pca - PCA runner
Method Detail

run

public Clustering<Model> run(Database database,
                             Relation<V> relation)
                      throws IllegalStateException
Performs the ORCLUS algorithm on the given database.

Throws:
IllegalStateException

initialSeeds

private List<ORCLUS.ORCLUSCluster> initialSeeds(Relation<V> database,
                                                int k)
Initializes the list of seeds wit a random sample of size k.

Parameters:
database - the database holding the objects
k - the size of the random sample
Returns:
the initial seed list

assign

private void assign(Relation<V> database,
                    DistanceQuery<V,DoubleDistance> distFunc,
                    List<ORCLUS.ORCLUSCluster> clusters)
Creates a partitioning of the database by assigning each object to its closest seed.

Parameters:
database - the database holding the objects
distFunc - distance function
clusters - the array of clusters to which the objects should be assigned to

findBasis

private Matrix findBasis(Relation<V> database,
                         DistanceQuery<V,DoubleDistance> distFunc,
                         ORCLUS.ORCLUSCluster cluster,
                         int dim)
Finds the basis of the subspace of dimensionality dim for the specified cluster.

Parameters:
database - the database to run the algorithm on
distFunc - the distance function
cluster - the cluster
dim - the dimensionality of the subspace
Returns:
matrix defining the basis of the subspace for the specified cluster

merge

private void merge(Relation<V> database,
                   DistanceQuery<V,DoubleDistance> distFunc,
                   List<ORCLUS.ORCLUSCluster> clusters,
                   int k_new,
                   int d_new,
                   IndefiniteProgress cprogress)
Reduces the number of seeds to k_new

Parameters:
database - the database holding the objects
distFunc - the distance function
clusters - the set of current seeds
k_new - the new number of seeds
d_new - the new dimensionality of the subspaces for each seed

projectedEnergy

private ORCLUS.ProjectedEnergy projectedEnergy(Relation<V> database,
                                               DistanceQuery<V,DoubleDistance> distFunc,
                                               ORCLUS.ORCLUSCluster c_i,
                                               ORCLUS.ORCLUSCluster c_j,
                                               int i,
                                               int j,
                                               int dim)
Computes the projected energy of the specified clusters. The projected energy is given by the mean square distance of the points to the centroid of the union cluster c, when all points in c are projected to the subspace of c.

Parameters:
database - the database holding the objects
distFunc - the distance function
c_i - the first cluster
c_j - the second cluster
i - the index of cluster c_i in the cluster list
j - the index of cluster c_j in the cluster list
dim - the dimensionality of the clusters
Returns:
the projected energy of the specified cluster

union

private ORCLUS.ORCLUSCluster union(Relation<V> database,
                                   DistanceQuery<V,DoubleDistance> distFunc,
                                   ORCLUS.ORCLUSCluster c1,
                                   ORCLUS.ORCLUSCluster c2,
                                   int dim)
Returns the union of the two specified clusters.

Parameters:
database - the database holding the objects
distFunc - the distance function
c1 - the first cluster
c2 - the second cluster
dim - the dimensionality of the union cluster
Returns:
the union of the two specified clusters

projection

private V projection(ORCLUS.ORCLUSCluster c,
                     V o,
                     V factory)
Returns the projection of real vector o in the subspace of cluster c.

Parameters:
c - the cluster
o - the double vector
factory - Factory object / prototype
Returns:
the projection of double vector o in the subspace of cluster c

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<Clustering<Model>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<Clustering<Model>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)