|
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectde.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<R>
de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering<Clustering<Model>,V>
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.PROCLUS<V>
V - the type of NumberVector handled by this Algorithm@Title(value="PROCLUS: PROjected CLUStering")
@Description(value="Algorithm to find subspace clusters in high dimensional spaces.")
@Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park",
title="Fast Algorithms for Projected Clustering",
booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)",
url="http://dx.doi.org/10.1145/304181.304188")
public class PROCLUS<V extends NumberVector<V,?>>
Provides the PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces. Reference:
| Nested Class Summary | |
|---|---|
static class |
PROCLUS.Parameterizer<V extends NumberVector<V,?>>
Parameterization class. |
private class |
PROCLUS.PROCLUSCluster
Encapsulates the attributes of a cluster. |
| Field Summary | |
|---|---|
private static Logging |
logger
The logger for this class. |
private int |
m_i
Holds the value of M_I_ID. |
static OptionID |
M_I_ID
Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0. |
private Long |
seed
Holds the value of SEED_ID. |
static OptionID |
SEED_ID
Parameter to specify the random generator seed. |
| Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering |
|---|
k, k_i, K_I_ID, K_ID, l, L_ID |
| Constructor Summary | |
|---|---|
PROCLUS(int k,
int k_i,
int l,
int m_i,
Long seed)
Java constructor. |
|
| Method Summary | |
|---|---|
private Map<DBID,PROCLUS.PROCLUSCluster> |
assignPoints(Map<DBID,Set<Integer>> dimensions,
Relation<V> database)
Assigns the objects to the clusters. |
private double |
avgDistance(V centroid,
DBIDs objectIDs,
Relation<V> database,
int dimension)
Computes the average distance of the objects to the centroid along the specified dimension. |
private ModifiableDBIDs |
computeBadMedoids(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad. |
private ModifiableDBIDs |
computeM_current(DBIDs m,
DBIDs m_best,
DBIDs m_bad,
Random random)
Computes the set of medoids in current iteration. |
private double |
evaluateClusters(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
Map<DBID,Set<Integer>> dimensions,
Relation<V> database)
Evaluates the quality of the clusters. |
private List<PROCLUS.PROCLUSCluster> |
finalAssignment(List<Pair<V,Set<Integer>>> dimensions,
Relation<V> database)
Refinement step to assign the objects to the final clusters. |
private Map<DBID,Set<Integer>> |
findDimensions(DBIDs medoids,
Relation<V> database,
DistanceQuery<V,DoubleDistance> distFunc,
RangeQuery<V,DoubleDistance> rangeQuery)
Determines the set of correlated dimensions for each medoid in the specified medoid set. |
private List<Pair<V,Set<Integer>>> |
findDimensions(List<PROCLUS.PROCLUSCluster> clusters,
Relation<V> database)
Refinement step that determines the set of correlated dimensions for each cluster centroid. |
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query. |
private Map<DBID,List<DistanceResultPair<DoubleDistance>>> |
getLocalities(DBIDs medoids,
Relation<V> database,
DistanceQuery<V,DoubleDistance> distFunc,
RangeQuery<V,DoubleDistance> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the objects in the sphere centered at m with radius minDist are determined, where minDist is the minimum distance between medoid m and any other medoid m_i. |
protected Logging |
getLogger()
Get the (STATIC) logger for this class. |
private ModifiableDBIDs |
greedy(DistanceQuery<V,DoubleDistance> distFunc,
DBIDs sampleSet,
int m,
Random random)
Returns a piercing set of k medoids from the specified sample set. |
private ModifiableDBIDs |
initialSet(DBIDs sampleSet,
int k,
Random random)
Returns a set of k elements from the specified sample set. |
private DoubleDistance |
manhattanSegmentalDistance(V o1,
V o2,
Set<Integer> dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions. |
Clustering<Model> |
run(Database database,
Relation<V> relation)
Performs the PROCLUS algorithm on the given database. |
| Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.AbstractProjectedClustering |
|---|
getDistanceFunction, getDistanceQuery |
| Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm |
|---|
makeParameterDistanceFunction, run |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm |
|---|
run |
| Field Detail |
|---|
private static final Logging logger
public static final OptionID M_I_ID
Default value: 10
Key: -proclus.mi
public static final OptionID SEED_ID
private int m_i
M_I_ID.
private Long seed
SEED_ID.
| Constructor Detail |
|---|
public PROCLUS(int k,
int k_i,
int l,
int m_i,
Long seed)
k - k Parameterk_i - k_i Parameterl - l Parameterm_i - m_i Parameterseed - Random generator seed| Method Detail |
|---|
public Clustering<Model> run(Database database,
Relation<V> relation)
throws IllegalStateException
IllegalStateException
private ModifiableDBIDs greedy(DistanceQuery<V,DoubleDistance> distFunc,
DBIDs sampleSet,
int m,
Random random)
distFunc - the distance functionsampleSet - the sample setm - the number of medoids to be returnedrandom - random number generator
private ModifiableDBIDs initialSet(DBIDs sampleSet,
int k,
Random random)
sampleSet - the sample setk - the number of samples to be returnedrandom - random number generator
private ModifiableDBIDs computeM_current(DBIDs m,
DBIDs m_best,
DBIDs m_bad,
Random random)
m - the medoidsm_best - the best set of medoids found so farm_bad - the bad medoidsrandom - random number generator
private Map<DBID,List<DistanceResultPair<DoubleDistance>>> getLocalities(DBIDs medoids,
Relation<V> database,
DistanceQuery<V,DoubleDistance> distFunc,
RangeQuery<V,DoubleDistance> rangeQuery)
medoids - the ids of the medoidsdatabase - the database holding the objectsdistFunc - the distance function
private Map<DBID,Set<Integer>> findDimensions(DBIDs medoids,
Relation<V> database,
DistanceQuery<V,DoubleDistance> distFunc,
RangeQuery<V,DoubleDistance> rangeQuery)
medoids - the set of medoidsdatabase - the database containing the objectsdistFunc - the distance function
private List<Pair<V,Set<Integer>>> findDimensions(List<PROCLUS.PROCLUSCluster> clusters,
Relation<V> database)
clusters - the list of clustersdatabase - the database containing the objects
private Map<DBID,PROCLUS.PROCLUSCluster> assignPoints(Map<DBID,Set<Integer>> dimensions,
Relation<V> database)
dimensions - set of correlated dimensions for each medoid of the
clusterdatabase - the database containing the objects
private List<PROCLUS.PROCLUSCluster> finalAssignment(List<Pair<V,Set<Integer>>> dimensions,
Relation<V> database)
dimensions - pair containing the centroid and the set of correlated
dimensions for the centroiddatabase - the database containing the objects
private DoubleDistance manhattanSegmentalDistance(V o1,
V o2,
Set<Integer> dimensions)
o1 - the first objecto2 - the second objectdimensions - the dimensions to be considered
private double evaluateClusters(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
Map<DBID,Set<Integer>> dimensions,
Relation<V> database)
clusters - the clusters to be evaluateddimensions - the dimensions associated with each clusterdatabase - the database holding the objects
private double avgDistance(V centroid,
DBIDs objectIDs,
Relation<V> database,
int dimension)
centroid - the centroidobjectIDs - the set of objects idsdatabase - the database holding the objectsdimension - the dimension for which the average distance is computed
private ModifiableDBIDs computeBadMedoids(Map<DBID,PROCLUS.PROCLUSCluster> clusters,
int threshold)
clusters - the clustersthreshold - the threshold
public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<Model>>protected Logging getLogger()
AbstractAlgorithm
getLogger in class AbstractAlgorithm<Clustering<Model>>
|
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||||