|
|
|||||||||||||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.lmu.ifi.dbs.elki.logging.AbstractLoggable
de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<Model>>
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering<V>
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.PROCLUS<V>
V
- the type of NumberVector handled by this Algorithm@Title(value="PROCLUS: PROjected CLUStering") @Description(value="Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggrawal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="http://dx.doi.org/10.1145/304181.304188") public class PROCLUS<V extends NumberVector<V,?>>
Provides the PROCLUS algorithm, an algorithm to find subspace clusters in high dimensional spaces. Reference:
Nested Class Summary | |
---|---|
private class |
PROCLUS.PROCLUSCluster
Encapsulates the attributes of a cluster. |
Field Summary | |
---|---|
private int |
m_i
Holds the value of M_I_PARAM . |
static OptionID |
M_I_ID
OptionID for M_I_PARAM |
private IntParameter |
M_I_PARAM
Parameter to specify the multiplier for the initial number of medoids, must be an integer greater than 0. |
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering |
---|
K_I_ID, K_ID, L_ID |
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
---|
debug, logger |
Constructor Summary | |
---|---|
PROCLUS(Parameterization config)
Constructor, adhering to Parameterizable |
Method Summary | |
---|---|
private Map<Integer,PROCLUS.PROCLUSCluster> |
assignPoints(Map<Integer,Set<Integer>> dimensions,
Database<V> database)
Assigns the objects to the clusters. |
private double |
avgDistance(V centroid,
Set<Integer> objectIDs,
Database<V> database,
int dimension)
Computes the average distance of the objects to the centroid along the specified dimension. |
private Set<Integer> |
computeBadMedoids(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the specified threshold of objects is bad. |
private Set<Integer> |
computeM_current(Set<Integer> m,
Set<Integer> m_best,
Set<Integer> m_bad)
Computes the set of medoids in current iteration. |
private double |
evaluateClusters(Map<Integer,PROCLUS.PROCLUSCluster> clusters,
Map<Integer,Set<Integer>> dimensions,
Database<V> database)
Evaluates the quality of the clusters. |
private Map<Integer,Set<Integer>> |
findDimensions(Set<Integer> medoids,
Database<V> database,
Map<Integer,List<DistanceResultPair<DoubleDistance>>> localities)
Determines the set of correlated dimensions for each medoid in the specified medoid set. |
private Map<Integer,List<DistanceResultPair<DoubleDistance>>> |
getLocalities(Set<Integer> m_c,
Database<V> database)
Computes the localities of the specified medoids. |
private Set<Integer> |
greedy(Set<Integer> sampleSet,
int m)
Returns a piercing set of k medoids from the specified sample set. |
private Set<Integer> |
initialSet(Set<Integer> sampleSet,
int k)
Returns a set of k elements from the specified sample set. |
private DoubleDistance |
manhattanSegmentalDistance(V o1,
V o2,
Set<Integer> dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the specified dimensions. |
protected Clustering<Model> |
runInTime(Database<V> database)
Performs the PROCLUS algorithm on the given database. |
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.ProjectedClustering |
---|
getDistanceFunction, getK_i, getK, getL |
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm |
---|
isTime, isVerbose, run, setTime, setVerbose |
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
---|
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm |
---|
run |
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm |
---|
setTime, setVerbose |
Field Detail |
---|
public static final OptionID M_I_ID
M_I_PARAM
private final IntParameter M_I_PARAM
Default value: 10
Key: -proclus.mi
private int m_i
M_I_PARAM
.
Constructor Detail |
---|
public PROCLUS(Parameterization config)
Parameterizable
config
- ParameterizationMethod Detail |
---|
protected Clustering<Model> runInTime(Database<V> database) throws IllegalStateException
runInTime
in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<Model>>
database
- the database to run the algorithm on
IllegalStateException
- if the algorithm has not been initialized
properly (e.g. the setParameters(String[]) method has been failed
to be called).private Set<Integer> greedy(Set<Integer> sampleSet, int m)
sampleSet
- the sample setm
- the number of medoids to be returned
private Set<Integer> initialSet(Set<Integer> sampleSet, int k)
sampleSet
- the sample setk
- the number of samples to be returned
private Set<Integer> computeM_current(Set<Integer> m, Set<Integer> m_best, Set<Integer> m_bad)
m
- the medoidsm_best
- the best set of medoids found so farm_bad
- the bad medoids
private Map<Integer,List<DistanceResultPair<DoubleDistance>>> getLocalities(Set<Integer> m_c, Database<V> database)
m_c
- the ids of the medoidsdatabase
- the database holding the objects
private Map<Integer,Set<Integer>> findDimensions(Set<Integer> medoids, Database<V> database, Map<Integer,List<DistanceResultPair<DoubleDistance>>> localities)
medoids
- the set of medoidsdatabase
- the database containing the objectslocalities
- the localities of the specified medoids
private Map<Integer,PROCLUS.PROCLUSCluster> assignPoints(Map<Integer,Set<Integer>> dimensions, Database<V> database)
dimensions
- set of correlated dimensions for each medoid of the
clusterdatabase
- the database containing the objects
private DoubleDistance manhattanSegmentalDistance(V o1, V o2, Set<Integer> dimensions)
o1
- the first objecto2
- the second objectdimensions
- the dimensions to be considered
private double evaluateClusters(Map<Integer,PROCLUS.PROCLUSCluster> clusters, Map<Integer,Set<Integer>> dimensions, Database<V> database)
clusters
- the clusters to be evaluateddimensions
- the dimensions associated with each clusterdatabase
- the database holding the objects
private double avgDistance(V centroid, Set<Integer> objectIDs, Database<V> database, int dimension)
centroid
- the centroidobjectIDs
- the set of objects idsdatabase
- the database holding the objectsdimension
- the dimension for which the average distance is computed
private Set<Integer> computeBadMedoids(Map<Integer,PROCLUS.PROCLUSCluster> clusters, int threshold)
clusters
- the clustersthreshold
- the threshold
|
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |