
V - the type of NumberVector handled by this Algorithm@Title(value="PROCLUS: PROjected CLUStering") @Description(value="Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="http://dx.doi.org/10.1145/304181.304188") public class PROCLUS<V extends NumberVector<?>> extends AbstractProjectedClustering<Clustering<SubspaceModel<V>>,V> implements SubspaceClusteringAlgorithm<SubspaceModel<V>>
| Modifier and Type | Class and Description | 
|---|---|
| static class  | PROCLUS.Parameterizer<V extends NumberVector<?>>Parameterization class. | 
| private class  | PROCLUS.PROCLUSClusterEncapsulates the attributes of a cluster. | 
| Modifier and Type | Field and Description | 
|---|---|
| private static Logging | LOGThe logger for this class. | 
| private int | m_iHolds the value of  M_I_ID. | 
| static OptionID | M_I_IDParameter to specify the multiplier for the initial number of medoids, must
 be an integer greater than 0. | 
| private RandomFactory | rndRandom generator | 
k, k_i, l| Constructor and Description | 
|---|
| PROCLUS(int k,
       int k_i,
       int l,
       int m_i,
       RandomFactory rnd)Java constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| private Map<DBID,PROCLUS.PROCLUSCluster> | assignPoints(Map<DBID,gnu.trove.set.TIntSet> dimensions,
            Relation<V> database)Assigns the objects to the clusters. | 
| private double | avgDistance(V centroid,
           DBIDs objectIDs,
           Relation<V> database,
           int dimension)Computes the average distance of the objects to the centroid along the
 specified dimension. | 
| private ModifiableDBIDs | computeBadMedoids(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
                 int threshold)Computes the bad medoids, where the medoid of a cluster with less than the
 specified threshold of objects is bad. | 
| private ModifiableDBIDs | computeM_current(DBIDs m,
                DBIDs m_best,
                DBIDs m_bad,
                Random random)Computes the set of medoids in current iteration. | 
| private double | evaluateClusters(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
                Map<DBID,gnu.trove.set.TIntSet> dimensions,
                Relation<V> database)Evaluates the quality of the clusters. | 
| private List<PROCLUS.PROCLUSCluster> | finalAssignment(List<Pair<V,gnu.trove.set.TIntSet>> dimensions,
               Relation<V> database)Refinement step to assign the objects to the final clusters. | 
| private Map<DBID,gnu.trove.set.TIntSet> | findDimensions(DBIDs medoids,
              Relation<V> database,
              DistanceQuery<V,DoubleDistance> distFunc,
              RangeQuery<V,DoubleDistance> rangeQuery)Determines the set of correlated dimensions for each medoid in the
 specified medoid set. | 
| private List<Pair<V,gnu.trove.set.TIntSet>> | findDimensions(List<PROCLUS.PROCLUSCluster> clusters,
              Relation<V> database)Refinement step that determines the set of correlated dimensions for each
 cluster centroid. | 
| TypeInformation[] | getInputTypeRestriction()Get the input type restriction used for negotiating the data query. | 
| private Map<DBID,DistanceDBIDResult<DoubleDistance>> | getLocalities(DBIDs medoids,
             Relation<V> database,
             DistanceQuery<V,DoubleDistance> distFunc,
             RangeQuery<V,DoubleDistance> rangeQuery)Computes the localities of the specified medoids: for each medoid m the
 objects in the sphere centered at m with radius minDist are determined,
 where minDist is the minimum distance between medoid m and any other medoid
 m_i. | 
| protected Logging | getLogger()Get the (STATIC) logger for this class. | 
| private ModifiableDBIDs | greedy(DistanceQuery<V,DoubleDistance> distFunc,
      DBIDs sampleSet,
      int m,
      Random random)Returns a piercing set of k medoids from the specified sample set. | 
| private ModifiableDBIDs | initialSet(DBIDs sampleSet,
          int k,
          Random random)Returns a set of k elements from the specified sample set. | 
| private DoubleDistance | manhattanSegmentalDistance(V o1,
                          V o2,
                          gnu.trove.set.TIntSet dimensions)Returns the Manhattan segmental distance between o1 and o2 relative to the
 specified dimensions. | 
| Clustering<SubspaceModel<V>> | run(Database database,
   Relation<V> relation)Performs the PROCLUS algorithm on the given database. | 
getDistanceFunction, getDistanceQuerymakeParameterDistanceFunction, runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
public static final OptionID M_I_ID
 Default value: 10
 
 Key: -proclus.mi
 
private int m_i
M_I_ID.private RandomFactory rnd
public PROCLUS(int k,
       int k_i,
       int l,
       int m_i,
       RandomFactory rnd)
k - k Parameterk_i - k_i Parameterl - l Parameterm_i - m_i Parameterrnd - Random generatorpublic Clustering<SubspaceModel<V>> run(Database database, Relation<V> relation)
database - Database to processrelation - Relation to processprivate ModifiableDBIDs greedy(DistanceQuery<V,DoubleDistance> distFunc, DBIDs sampleSet, int m, Random random)
distFunc - the distance functionsampleSet - the sample setm - the number of medoids to be returnedrandom - random number generatorprivate ModifiableDBIDs initialSet(DBIDs sampleSet, int k, Random random)
sampleSet - the sample setk - the number of samples to be returnedrandom - random number generatorprivate ModifiableDBIDs computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, Random random)
m - the medoidsm_best - the best set of medoids found so farm_bad - the bad medoidsrandom - random number generatorprivate Map<DBID,DistanceDBIDResult<DoubleDistance>> getLocalities(DBIDs medoids, Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, RangeQuery<V,DoubleDistance> rangeQuery)
medoids - the ids of the medoidsdatabase - the database holding the objectsdistFunc - the distance functionprivate Map<DBID,gnu.trove.set.TIntSet> findDimensions(DBIDs medoids, Relation<V> database, DistanceQuery<V,DoubleDistance> distFunc, RangeQuery<V,DoubleDistance> rangeQuery)
medoids - the set of medoidsdatabase - the database containing the objectsdistFunc - the distance functionprivate List<Pair<V,gnu.trove.set.TIntSet>> findDimensions(List<PROCLUS.PROCLUSCluster> clusters, Relation<V> database)
clusters - the list of clustersdatabase - the database containing the objectsprivate Map<DBID,PROCLUS.PROCLUSCluster> assignPoints(Map<DBID,gnu.trove.set.TIntSet> dimensions, Relation<V> database)
dimensions - set of correlated dimensions for each medoid of the
        clusterdatabase - the database containing the objectsprivate List<PROCLUS.PROCLUSCluster> finalAssignment(List<Pair<V,gnu.trove.set.TIntSet>> dimensions, Relation<V> database)
dimensions - pair containing the centroid and the set of correlated
        dimensions for the centroiddatabase - the database containing the objectsprivate DoubleDistance manhattanSegmentalDistance(V o1, V o2, gnu.trove.set.TIntSet dimensions)
o1 - the first objecto2 - the second objectdimensions - the dimensions to be consideredprivate double evaluateClusters(Map<DBID,PROCLUS.PROCLUSCluster> clusters, Map<DBID,gnu.trove.set.TIntSet> dimensions, Relation<V> database)
clusters - the clusters to be evaluateddimensions - the dimensions associated with each clusterdatabase - the database holding the objectsprivate double avgDistance(V centroid, DBIDs objectIDs, Relation<V> database, int dimension)
centroid - the centroidobjectIDs - the set of objects idsdatabase - the database holding the objectsdimension - the dimension for which the average distance is computedprivate ModifiableDBIDs computeBadMedoids(Map<DBID,PROCLUS.PROCLUSCluster> clusters, int threshold)
clusters - the clustersthreshold - the thresholdpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<SubspaceModel<V extends NumberVector<?>>>>