de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Class PROCLUS<V extends NumberVector<V,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<R>
      extended by de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering<Clustering<Model>,V>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.PROCLUS<V>
Type Parameters:
V - the type of NumberVector handled by this Algorithm
All Implemented Interfaces:
Algorithm, ClusteringAlgorithm<Clustering<Model>>, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="PROCLUS: PROjected CLUStering")
@Description(value="Algorithm to find subspace clusters in high dimensional spaces.")
@Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park",
           title="Fast Algorithms for Projected Clustering",
           booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)",
           url="http://dx.doi.org/10.1145/304181.304188")
public class PROCLUS<V extends NumberVector<V,?>>
extends AbstractProjectedClustering<Clustering<Model>,V>

Provides the PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces.

Reference:
C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park: Fast Algorithms for Projected Clustering.
In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD '99).


Nested Class Summary
static class PROCLUS.Parameterizer<V extends NumberVector<V,?>>
          Parameterization class.
private  class PROCLUS.PROCLUSCluster
          Encapsulates the attributes of a cluster.
 
Field Summary
private static Logging logger
          The logger for this class.
private  int m_i
          Holds the value of M_I_ID.
static OptionID M_I_ID
          Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0.
private  Long seed
          Holds the value of SEED_ID.
static OptionID SEED_ID
          Parameter to specify the random generator seed.
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering
k, k_i, K_I_ID, K_ID, l, L_ID
 
Constructor Summary
PROCLUS(int k, int k_i, int l, int m_i, Long seed)
          Java constructor.
 
Method Summary
private  Map<DBID,PROCLUS.PROCLUSCluster> assignPoints(Map<DBID,Set<Integer>> dimensions, Relation<V> database)
          Assigns the objects to the clusters.
private  double avgDistance(V centroid, DBIDs objectIDs, Relation<V> database, int dimension)
          Computes the average distance of the objects to the centroid along the specified dimension.
private  ModifiableDBIDs computeBadMedoids(Map<DBID,PROCLUS.PROCLUSCluster> clusters, int threshold)
          Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.
private  ModifiableDBIDs computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, Random random)
          Computes the set of medoids in current iteration.
private  double evaluateClusters(Map<DBID,PROCLUS.PROCLUSCluster> clusters, Map<DBID,Set<Integer>> dimensions, Relation<V> database)
          Evaluates the quality of the clusters.
private  List<PROCLUS.PROCLUSCluster> finalAssignment(List<Pair<V,Set<Integer>>> dimensions, Relation<V> database)
          Refinement step to assign the objects to the final clusters.
private  Map<DBID,Set<Integer>> findDimensions(DBIDs medoids, Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, RangeQuery<V,DoubleDistance> rangeQuery)
          Determines the set of correlated dimensions for each medoid in the specified medoid set.
private  List<Pair<V,Set<Integer>>> findDimensions(List<PROCLUS.PROCLUSCluster> clusters, Relation<V> database)
          Refinement step that determines the set of correlated dimensions for each cluster centroid.
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
private  Map<DBID,List<DistanceResultPair<DoubleDistance>>> getLocalities(DBIDs medoids, Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, RangeQuery<V,DoubleDistance> rangeQuery)
          Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
private  ModifiableDBIDs greedy(DistanceQuery<V,DoubleDistance> distFunc, DBIDs sampleSet, int m, Random random)
          Returns a piercing set of k medoids from the specified sample set.
private  ModifiableDBIDs initialSet(DBIDs sampleSet, int k, Random random)
          Returns a set of k elements from the specified sample set.
private  DoubleDistance manhattanSegmentalDistance(V o1, V o2, Set<Integer> dimensions)
          Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.
 Clustering<Model> run(Database database, Relation<V> relation)
          Performs the PROCLUS algorithm on the given database.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering
getDistanceFunction, getDistanceQuery
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 

Field Detail

logger

private static final Logging logger
The logger for this class.


M_I_ID

public static final OptionID M_I_ID
Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0.

Default value: 10

Key: -proclus.mi


SEED_ID

public static final OptionID SEED_ID
Parameter to specify the random generator seed.


m_i

private int m_i
Holds the value of M_I_ID.


seed

private Long seed
Holds the value of SEED_ID.

Constructor Detail

PROCLUS

public PROCLUS(int k,
               int k_i,
               int l,
               int m_i,
               Long seed)
Java constructor.

Parameters:
k - k Parameter
k_i - k_i Parameter
l - l Parameter
m_i - m_i Parameter
seed - Random generator seed
Method Detail

run

public Clustering<Model> run(Database database,
                             Relation<V> relation)
                      throws IllegalStateException
Performs the PROCLUS algorithm on the given database.

Throws:
IllegalStateException

greedy

private ModifiableDBIDs greedy(DistanceQuery<V,DoubleDistance> distFunc,
                               DBIDs sampleSet,
                               int m,
                               Random random)
Returns a piercing set of k medoids from the specified sample set.

Parameters:
distFunc - the distance function
sampleSet - the sample set
m - the number of medoids to be returned
random - random number generator
Returns:
a piercing set of m medoids from the specified sample set

initialSet

private ModifiableDBIDs initialSet(DBIDs sampleSet,
                                   int k,
                                   Random random)
Returns a set of k elements from the specified sample set.

Parameters:
sampleSet - the sample set
k - the number of samples to be returned
random - random number generator
Returns:
a set of k elements from the specified sample set

computeM_current

private ModifiableDBIDs computeM_current(DBIDs m,
                                         DBIDs m_best,
                                         DBIDs m_bad,
                                         Random random)
Computes the set of medoids in current iteration.

Parameters:
m - the medoids
m_best - the best set of medoids found so far
m_bad - the bad medoids
random - random number generator
Returns:
m_current, the set of medoids in current iteration

getLocalities

private Map<DBID,List<DistanceResultPair<DoubleDistance>>> getLocalities(DBIDs medoids,
                                                                         Relation<V> database,
                                                                         DistanceQuery<V,DoubleDistance> distFunc,
                                                                         RangeQuery<V,DoubleDistance> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i.

Parameters:
medoids - the ids of the medoids
database - the database holding the objects
distFunc - the distance function
Returns:
a mapping of the medoid's id to its locality

findDimensions

private Map<DBID,Set<Integer>> findDimensions(DBIDs medoids,
                                              Relation<V> database,
                                              DistanceQuery<V,DoubleDistance> distFunc,
                                              RangeQuery<V,DoubleDistance> rangeQuery)
Determines the set of correlated dimensions for each medoid in the specified medoid set.

Parameters:
medoids - the set of medoids
database - the database containing the objects
distFunc - the distance function
Returns:
the set of correlated dimensions for each medoid in the specified medoid set

findDimensions

private List<Pair<V,Set<Integer>>> findDimensions(List<PROCLUS.PROCLUSCluster> clusters,
                                                  Relation<V> database)
Refinement step that determines the set of correlated dimensions for each cluster centroid.

Parameters:
clusters - the list of clusters
database - the database containing the objects
Returns:
the set of correlated dimensions for each specified cluster centroid

assignPoints

private Map<DBID,PROCLUS.PROCLUSCluster> assignPoints(Map<DBID,Set<Integer>> dimensions,
                                                      Relation<V> database)
Assigns the objects to the clusters.

Parameters:
dimensions - set of correlated dimensions for each medoid of the cluster
database - the database containing the objects
Returns:
the assignments of the object to the clusters

finalAssignment

private List<PROCLUS.PROCLUSCluster> finalAssignment(List<Pair<V,Set<Integer>>> dimensions,
                                                     Relation<V> database)
Refinement step to assign the objects to the final clusters.

Parameters:
dimensions - pair containing the centroid and the set of correlated dimensions for the centroid
database - the database containing the objects
Returns:
the assignments of the object to the clusters

manhattanSegmentalDistance

private DoubleDistance manhattanSegmentalDistance(V o1,
                                                  V o2,
                                                  Set<Integer> dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions.

Parameters:
o1 - the first object
o2 - the second object
dimensions - the dimensions to be considered
Returns:
the Manhattan segmental distance between o1 and o2 relative to the specified dimensions

evaluateClusters

private double evaluateClusters(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
                                Map<DBID,Set<Integer>> dimensions,
                                Relation<V> database)
Evaluates the quality of the clusters.

Parameters:
clusters - the clusters to be evaluated
dimensions - the dimensions associated with each cluster
database - the database holding the objects
Returns:
a measure for the cluster quality

avgDistance

private double avgDistance(V centroid,
                           DBIDs objectIDs,
                           Relation<V> database,
                           int dimension)
Computes the average distance of the objects to the centroid along the specified dimension.

Parameters:
centroid - the centroid
objectIDs - the set of objects ids
database - the database holding the objects
dimension - the dimension for which the average distance is computed
Returns:
the average distance of the objects to the centroid along the specified dimension

computeBadMedoids

private ModifiableDBIDs computeBadMedoids(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
                                          int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad.

Parameters:
clusters - the clusters
threshold - the threshold
Returns:
the bad medoids

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<Clustering<Model>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<Clustering<Model>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)