|
|
|||||||||||||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||||||||||||||
java.lang.Objectde.lmu.ifi.dbs.elki.logging.AbstractLoggable
de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<Model>>
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering<V>
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.PROCLUS<V>
V - the type of NumberVector handled by this Algorithm@Title(value="PROCLUS: PROjected CLUStering")
@Description(value="Algorithm to find subspace clusters in high dimensional spaces.")
@Reference(authors="C. C. Aggrawal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park",
title="Fast Algorithms for Projected Clustering",
booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)",
url="http://dx.doi.org/10.1145/304181.304188")
public class PROCLUS<V extends NumberVector<V,?>>Provides the PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces. Reference:
| Nested Class Summary | |
|---|---|
private class |
PROCLUS.PROCLUSCluster
Encapsulates the attributes of a cluster. |
| Field Summary | |
|---|---|
private int |
m_i
Holds the value of M_I_PARAM. |
static OptionID |
M_I_ID
OptionID for M_I_PARAM |
private IntParameter |
M_I_PARAM
Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0. |
| Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering |
|---|
K_I_ID, K_ID, L_ID |
| Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
|---|
debug, logger |
| Constructor Summary | |
|---|---|
PROCLUS(Parameterization config)
Constructor, adhering to Parameterizable |
|
| Method Summary | |
|---|---|
private Map<Integer,PROCLUS.PROCLUSCluster> |
assignPoints(Map<Integer,Set<Integer>> dimensions,
Database<V> database)
Assigns the objects to the clusters. |
private double |
avgDistance(V centroid,
Set<Integer> objectIDs,
Database<V> database,
int dimension)
Computes the average distance of the objects to the centroid along the specified dimension. |
private Set<Integer> |
computeBadMedoids(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad. |
private Set<Integer> |
computeM_current(Set<Integer> m,
Set<Integer> m_best,
Set<Integer> m_bad)
Computes the set of medoids in current iteration. |
private double |
evaluateClusters(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
Map<Integer,Set<Integer>> dimensions,
Database<V> database)
Evaluates the quality of the clusters. |
private Map<Integer,Set<Integer>> |
findDimensions(Set<Integer> medoids,
Database<V> database,
Map<Integer,List<DistanceResultPair<DoubleDistance>>> localities)
Determines the set of correlated dimensions for each medoid in the specified medoid set. |
private Map<Integer,List<DistanceResultPair<DoubleDistance>>> |
getLocalities(Set<Integer> m_c,
Database<V> database)
Computes the localities of the specified medoids. |
private Set<Integer> |
greedy(Set<Integer> sampleSet,
int m)
Returns a piercing set of k medoids from the specified sample set. |
private Set<Integer> |
initialSet(Set<Integer> sampleSet,
int k)
Returns a set of k elements from the specified sample set. |
private DoubleDistance |
manhattanSegmentalDistance(V o1,
V o2,
Set<Integer> dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions. |
protected Clustering<Model> |
runInTime(Database<V> database)
Performs the PROCLUS algorithm on the given database. |
| Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering |
|---|
getDistanceFunction, getK_i, getK, getL |
| Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm |
|---|
isTime, isVerbose, run, setTime, setVerbose |
| Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
|---|
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm |
|---|
run |
| Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm |
|---|
setTime, setVerbose |
| Field Detail |
|---|
public static final OptionID M_I_ID
M_I_PARAM
private final IntParameter M_I_PARAM
Default value: 10
Key: -proclus.mi
private int m_i
M_I_PARAM.
| Constructor Detail |
|---|
public PROCLUS(Parameterization config)
Parameterizable
config - Parameterization| Method Detail |
|---|
protected Clustering<Model> runInTime(Database<V> database)
throws IllegalStateException
runInTime in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<Model>>database - the database to run the algorithm on
IllegalStateException - if the algorithm has not been initialized
properly (e.g. the setParameters(String[]) method has been failed
to be called).
private Set<Integer> greedy(Set<Integer> sampleSet,
int m)
sampleSet - the sample setm - the number of medoids to be returned
private Set<Integer> initialSet(Set<Integer> sampleSet,
int k)
sampleSet - the sample setk - the number of samples to be returned
private Set<Integer> computeM_current(Set<Integer> m,
Set<Integer> m_best,
Set<Integer> m_bad)
m - the medoidsm_best - the best set of medoids found so farm_bad - the bad medoids
private Map<Integer,List<DistanceResultPair<DoubleDistance>>> getLocalities(Set<Integer> m_c,
Database<V> database)
m_c - the ids of the medoidsdatabase - the database holding the objects
private Map<Integer,Set<Integer>> findDimensions(Set<Integer> medoids,
Database<V> database,
Map<Integer,List<DistanceResultPair<DoubleDistance>>> localities)
medoids - the set of medoidsdatabase - the database containing the objectslocalities - the localities of the specified medoids
private Map<Integer,PROCLUS.PROCLUSCluster> assignPoints(Map<Integer,Set<Integer>> dimensions,
Database<V> database)
dimensions - set of correlated dimensions for each medoid of the
clusterdatabase - the database containing the objects
private DoubleDistance manhattanSegmentalDistance(V o1,
V o2,
Set<Integer> dimensions)
o1 - the first objecto2 - the second objectdimensions - the dimensions to be considered
private double evaluateClusters(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
Map<Integer,Set<Integer>> dimensions,
Database<V> database)
clusters - the clusters to be evaluateddimensions - the dimensions associated with each clusterdatabase - the database holding the objects
private double avgDistance(V centroid,
Set<Integer> objectIDs,
Database<V> database,
int dimension)
centroid - the centroidobjectIDs - the set of objects idsdatabase - the database holding the objectsdimension - the dimension for which the average distance is computed
private Set<Integer> computeBadMedoids(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
int threshold)
clusters - the clustersthreshold - the threshold
|
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||||