V
- the type of NumberVector handled by this Algorithm@Title(value="PROCLUS: PROjected CLUStering") @Description(value="Algorithm to find subspace clusters in high dimensional spaces.") @Reference(authors="C. C. Aggarwal, C. Procopiuc, J. L. Wolf, P. S. Yu, J. S. Park", title="Fast Algorithms for Projected Clustering", booktitle="Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD \'99)", url="http://dx.doi.org/10.1145/304181.304188") public class PROCLUS<V extends NumberVector> extends AbstractProjectedClustering<Clustering<SubspaceModel>,V> implements SubspaceClusteringAlgorithm<SubspaceModel>
Modifier and Type | Class and Description |
---|---|
private static class |
PROCLUS.DoubleIntInt
Simple triple.
|
static class |
PROCLUS.Parameterizer<V extends NumberVector>
Parameterization class.
|
private class |
PROCLUS.PROCLUSCluster
Encapsulates the attributes of a cluster.
|
Modifier and Type | Field and Description |
---|---|
private static Logging |
LOG
The logger for this class.
|
private int |
m_i
Multiplier for the initial number of medoids.
|
private RandomFactory |
rnd
Random generator
|
k, k_i, l
Constructor and Description |
---|
PROCLUS(int k,
int k_i,
int l,
int m_i,
RandomFactory rnd)
Java constructor.
|
Modifier and Type | Method and Description |
---|---|
private ArrayList<PROCLUS.PROCLUSCluster> |
assignPoints(ArrayDBIDs m_current,
long[][] dimensions,
Relation<V> database)
Assigns the objects to the clusters.
|
private double |
avgDistance(Vector centroid,
DBIDs objectIDs,
Relation<V> database,
int dimension)
Computes the average distance of the objects to the centroid along the
specified dimension.
|
private DBIDs |
computeBadMedoids(ArrayDBIDs m_current,
ArrayList<PROCLUS.PROCLUSCluster> clusters,
int threshold)
Computes the bad medoids, where the medoid of a cluster with less than the
specified threshold of objects is bad.
|
private ArrayDBIDs |
computeM_current(DBIDs m,
DBIDs m_best,
DBIDs m_bad,
Random random)
Computes the set of medoids in current iteration.
|
private double |
evaluateClusters(ArrayList<PROCLUS.PROCLUSCluster> clusters,
long[][] dimensions,
Relation<V> database)
Evaluates the quality of the clusters.
|
private List<PROCLUS.PROCLUSCluster> |
finalAssignment(List<Pair<Vector,long[]>> dimensions,
Relation<V> database)
Refinement step to assign the objects to the final clusters.
|
private long[][] |
findDimensions(ArrayDBIDs medoids,
Relation<V> database,
DistanceQuery<V> distFunc,
RangeQuery<V> rangeQuery)
Determines the set of correlated dimensions for each medoid in the
specified medoid set.
|
private List<Pair<Vector,long[]>> |
findDimensions(ArrayList<PROCLUS.PROCLUSCluster> clusters,
Relation<V> database)
Refinement step that determines the set of correlated dimensions for each
cluster centroid.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
private DataStore<DoubleDBIDList> |
getLocalities(DBIDs medoids,
Relation<V> database,
DistanceQuery<V> distFunc,
RangeQuery<V> rangeQuery)
Computes the localities of the specified medoids: for each medoid m the
objects in the sphere centered at m with radius minDist are determined,
where minDist is the minimum distance between medoid m and any other medoid
m_i.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private ArrayDBIDs |
greedy(DistanceQuery<V> distFunc,
DBIDs sampleSet,
int m,
Random random)
Returns a piercing set of k medoids from the specified sample set.
|
private ArrayDBIDs |
initialSet(DBIDs sampleSet,
int k,
Random random)
Returns a set of k elements from the specified sample set.
|
private double |
manhattanSegmentalDistance(NumberVector o1,
NumberVector o2,
long[] dimensions)
Returns the Manhattan segmental distance between o1 and o2 relative to the
specified dimensions.
|
Clustering<SubspaceModel> |
run(Database database,
Relation<V> relation)
Performs the PROCLUS algorithm on the given database.
|
getDistanceFunction, getDistanceQuery
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
private int m_i
private RandomFactory rnd
public PROCLUS(int k, int k_i, int l, int m_i, RandomFactory rnd)
k
- k Parameterk_i
- k_i Parameterl
- l Parameterm_i
- m_i Parameterrnd
- Random generatorpublic Clustering<SubspaceModel> run(Database database, Relation<V> relation)
database
- Database to processrelation
- Relation to processprivate ArrayDBIDs greedy(DistanceQuery<V> distFunc, DBIDs sampleSet, int m, Random random)
distFunc
- the distance functionsampleSet
- the sample setm
- the number of medoids to be returnedrandom
- random number generatorprivate ArrayDBIDs initialSet(DBIDs sampleSet, int k, Random random)
sampleSet
- the sample setk
- the number of samples to be returnedrandom
- random number generatorprivate ArrayDBIDs computeM_current(DBIDs m, DBIDs m_best, DBIDs m_bad, Random random)
m
- the medoidsm_best
- the best set of medoids found so farm_bad
- the bad medoidsrandom
- random number generatorprivate DataStore<DoubleDBIDList> getLocalities(DBIDs medoids, Relation<V> database, DistanceQuery<V> distFunc, RangeQuery<V> rangeQuery)
medoids
- the ids of the medoidsdatabase
- the database holding the objectsdistFunc
- the distance functionprivate long[][] findDimensions(ArrayDBIDs medoids, Relation<V> database, DistanceQuery<V> distFunc, RangeQuery<V> rangeQuery)
medoids
- the set of medoidsdatabase
- the database containing the objectsdistFunc
- the distance functionprivate List<Pair<Vector,long[]>> findDimensions(ArrayList<PROCLUS.PROCLUSCluster> clusters, Relation<V> database)
clusters
- the list of clustersdatabase
- the database containing the objectsprivate ArrayList<PROCLUS.PROCLUSCluster> assignPoints(ArrayDBIDs m_current, long[][] dimensions, Relation<V> database)
m_current
- Current centersdimensions
- set of correlated dimensions for each medoid of the
clusterdatabase
- the database containing the objectsprivate List<PROCLUS.PROCLUSCluster> finalAssignment(List<Pair<Vector,long[]>> dimensions, Relation<V> database)
dimensions
- pair containing the centroid and the set of correlated
dimensions for the centroiddatabase
- the database containing the objectsprivate double manhattanSegmentalDistance(NumberVector o1, NumberVector o2, long[] dimensions)
o1
- the first objecto2
- the second objectdimensions
- the dimensions to be consideredprivate double evaluateClusters(ArrayList<PROCLUS.PROCLUSCluster> clusters, long[][] dimensions, Relation<V> database)
clusters
- the clusters to be evaluateddimensions
- the dimensions associated with each clusterdatabase
- the database holding the objectsprivate double avgDistance(Vector centroid, DBIDs objectIDs, Relation<V> database, int dimension)
centroid
- the centroidobjectIDs
- the set of objects idsdatabase
- the database holding the objectsdimension
- the dimension for which the average distance is computedprivate DBIDs computeBadMedoids(ArrayDBIDs m_current, ArrayList<PROCLUS.PROCLUSCluster> clusters, int threshold)
m_current
- Current medoidsclusters
- the clustersthreshold
- the thresholdpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel>>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.