V
- the type of NumberVector handled by this Algorithm.@Title(value="P3C: A Robust Projected Clustering Algorithm.") @Reference(authors="Gabriela Moise, J\u00f6rg Sander, Martin Ester", title="P3C: A Robust Projected Clustering Algorithm", booktitle="Proc. Sixth International Conference on Data Mining (ICDM \'06)", url="http://dx.doi.org/10.1109/ICDM.2006.123") public class P3C<V extends NumberVector> extends AbstractAlgorithm<Clustering<SubspaceModel>> implements SubspaceClusteringAlgorithm<SubspaceModel>
Reference:
Gabriela Moise, Jörg Sander, Martin Ester
P3C: A Robust Projected Clustering Algorithm.
In: Proc. Sixth International Conference on Data Mining (ICDM '06)
Modifier and Type | Class and Description |
---|---|
private static class |
P3C.ClusterCandidate
This class is used to represent potential clusters.
|
static class |
P3C.Parameterizer<V extends NumberVector>
Parameterization class.
|
private static class |
P3C.Signature
P3C Cluster signature.
|
Modifier and Type | Field and Description |
---|---|
protected double |
alpha
Alpha threshold for testing.
|
protected double |
emDelta
Threshold when to stop EM iterations.
|
private static Logging |
LOG
The logger for this class.
|
protected int |
maxEmIterations
Maximum number of iterations for the EM step.
|
protected int |
minClusterSize
Minimum cluster size for noise flagging.
|
protected double |
poissonThreshold
Parameter for the Poisson test threshold.
|
Constructor and Description |
---|
P3C(double alpha,
double poissonThreshold,
int maxEmIterations,
double emDelta,
int minClusterSize)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private void |
assignUnassigned(Relation<V> relation,
WritableDataStore<double[]> probClusterIGivenX,
List<MultivariateGaussianModel> models,
ModifiableDBIDs unassigned)
Assign unassigned objects to best candidate based on shortest Mahalanobis
distance.
|
private int |
chiSquaredUniformTest(SetDBIDs[] parts,
long[] marked,
int card)
Performs a ChiSquared test to determine whether an attribute has a uniform
distribution.
|
private void |
computeFuzzyMembership(Relation<V> relation,
ArrayList<P3C.Signature> clusterCores,
ModifiableDBIDs unassigned,
WritableDataStore<double[]> probClusterIGivenX,
List<MultivariateGaussianModel> models,
int dim)
Computes a fuzzy membership with the weights based on which cluster cores
each data point is part of.
|
private ArrayList<P3C.Signature> |
constructOneSignatures(SetDBIDs[][] partitions,
long[][] markers)
Construct the 1-signatures by merging adjacent dense bins.
|
private void |
findOutliers(Relation<V> relation,
List<MultivariateGaussianModel> models,
ArrayList<P3C.ClusterCandidate> clusterCandidates,
ModifiableDBIDs noise)
Performs outlier detection by testing the Mahalanobis distance of each
point in a cluster against the critical value of the ChiSquared
distribution with as many degrees of freedom as the cluster has relevant
attributes.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private ArrayList<P3C.ClusterCandidate> |
hardClustering(WritableDataStore<double[]> probClusterIGivenX,
List<P3C.Signature> clusterCores,
DBIDs dbids)
Creates a hard clustering from the specified soft membership matrix.
|
private ArrayList<P3C.Signature> |
mergeClusterCores(int binCount,
ArrayList<P3C.Signature> signatures)
Merge 1-signatures into p-signatures.
|
protected P3C.Signature |
mergeSignatures(P3C.Signature first,
P3C.Signature second,
int numBins)
Generates a merged signature of this and another one, where the other
signature must be a 1-signature.
|
private SetDBIDs[][] |
partitionData(Relation<V> relation,
int bins)
Partition the data set into
bins bins in each dimension
independently. |
private ArrayList<P3C.Signature> |
pruneRedundantClusterCores(ArrayList<P3C.Signature> clusterCores) |
Clustering<SubspaceModel> |
run(Database database,
Relation<V> relation)
Performs the P3C algorithm on the given Database.
|
protected HashSetModifiableDBIDs |
unionDBIDs(DBIDs[] parts,
int start,
int end)
Compute the union of multiple DBID sets.
|
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
protected double poissonThreshold
protected int maxEmIterations
protected double emDelta
protected int minClusterSize
protected double alpha
public P3C(double alpha, double poissonThreshold, int maxEmIterations, double emDelta, int minClusterSize)
alpha
- ChiSquared test thresholdpoissonThreshold
- Poisson test thresholdmaxEmIterations
- Maximum number of EM iterationsemDelta
- EM stopping thresholdminClusterSize
- Minimum cluster sizepublic Clustering<SubspaceModel> run(Database database, Relation<V> relation)
private ArrayList<P3C.Signature> constructOneSignatures(SetDBIDs[][] partitions, long[][] markers)
partitions
- Initial partitions.markers
- Markers for dense partitions.private ArrayList<P3C.Signature> mergeClusterCores(int binCount, ArrayList<P3C.Signature> signatures)
binCount
- Number of bins in each dimension.signatures
- 1-signaturesprivate ArrayList<P3C.Signature> pruneRedundantClusterCores(ArrayList<P3C.Signature> clusterCores)
private SetDBIDs[][] partitionData(Relation<V> relation, int bins)
bins
bins in each dimension
independently.
This can be used to construct a grid approximation of the data using O(d n)
memory.
When a dimension is found to be constant, it will not be partitioned, but
instead the corresponding array will be set to null
.relation
- Data relation to partitionbins
- Number of binsprotected HashSetModifiableDBIDs unionDBIDs(DBIDs[] parts, int start, int end)
parts
- Parts arraystart
- Array start indexend
- Array end index (exclusive)private int chiSquaredUniformTest(SetDBIDs[] parts, long[] marked, int card)
parts
- Data partitions.marked
- the marked bins that should be ignored.card
- Cardinalityprivate void computeFuzzyMembership(Relation<V> relation, ArrayList<P3C.Signature> clusterCores, ModifiableDBIDs unassigned, WritableDataStore<double[]> probClusterIGivenX, List<MultivariateGaussianModel> models, int dim)
relation
- Data relationclusterCores
- the cluster cores.unassigned
- set to which to add unassigned points.probClusterIGivenX
- Membership probabilities.models
- Cluster models.dim
- Dimensionalityprivate void assignUnassigned(Relation<V> relation, WritableDataStore<double[]> probClusterIGivenX, List<MultivariateGaussianModel> models, ModifiableDBIDs unassigned)
relation
- Data relationprobClusterIGivenX
- fuzzy membership matrix.models
- Cluster models.unassigned
- the list of points not yet assigned.private ArrayList<P3C.ClusterCandidate> hardClustering(WritableDataStore<double[]> probClusterIGivenX, List<P3C.Signature> clusterCores, DBIDs dbids)
probClusterIGivenX
- the membership matrix.dbids
- mapping matrix row to DBID.private void findOutliers(Relation<V> relation, List<MultivariateGaussianModel> models, ArrayList<P3C.ClusterCandidate> clusterCandidates, ModifiableDBIDs noise)
relation
- Data relationmodels
- Cluster modelsclusterCandidates
- the list of clusters to check.noise
- the set to which to add points deemed outliers.protected P3C.Signature mergeSignatures(P3C.Signature first, P3C.Signature second, int numBins)
first
- First signature.second
- Second signature, must be a 1-signature.numBins
- Number of bins per dimension.public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<SubspaceModel>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<SubspaceModel>>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.