O
- the type of data objects handled by this algorithm@Title(value="Discovering cluster-based local outliers") @Reference(authors="Z. He, X. Xu, S. Deng", title="Discovering cluster-based local outliers", booktitle="Pattern Recognition Letters 24(9-10)", url="https://doi.org/10.1016/S0167-8655(03)00003-5", bibkey="DBLP:journals/prl/HeXD03") public class CBLOF<O extends NumberVector> extends AbstractDistanceBasedAlgorithm<O,OutlierResult> implements OutlierAlgorithm
Reference:
Z. He, X. Xu, S. Deng
Discovering cluster-based local outliers
Pattern Recognition Letters 24(9-10)
Implementation note: this algorithm is hard to implement in a generic fashion, as to support arbitrary clustering algorithms and distances, because it is not trivial to ensure both the clustering algorithm and the outlier method use compatible data types and distances.
Modifier and Type | Class and Description |
---|---|
static class |
CBLOF.Parameterizer<O extends NumberVector>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
protected double |
alpha
The ratio of the size that separates the large clusters from the small
clusters.
|
protected double |
beta
The minimal ratio between two consecutive clusters (when ordered descending
by size) at which the boundary between the large and small clusters is set.
|
protected ClusteringAlgorithm<Clustering<MeanModel>> |
clusteringAlgorithm
The clustering algorithm to use.
|
protected NumberVectorDistanceFunction<? super O> |
distance
Distance function to use.
|
private static Logging |
LOG
The logger for this class.
|
ALGORITHM_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
CBLOF(NumberVectorDistanceFunction<? super O> distanceFunction,
ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm,
double alpha,
double beta)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private void |
computeCBLOFs(Relation<O> relation,
NumberVectorDistanceFunction<? super O> distance,
WritableDoubleDataStore cblofs,
DoubleMinMax cblofMinMax,
java.util.List<? extends Cluster<MeanModel>> largeClusters,
java.util.List<? extends Cluster<MeanModel>> smallClusters)
Compute the CBLOF scores for all the data.
|
private double |
computeLargeClusterCBLOF(O obj,
NumberVectorDistanceFunction<? super O> distanceQuery,
NumberVector clusterMean,
Cluster<MeanModel> cluster) |
private double |
computeSmallClusterCBLOF(O obj,
NumberVectorDistanceFunction<? super O> distance,
java.util.List<NumberVector> largeClusterMeans,
Cluster<MeanModel> cluster) |
private int |
getClusterBoundary(Relation<O> relation,
java.util.List<? extends Cluster<MeanModel>> clusters)
Compute the boundary index separating the large cluster from the small
cluster.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
OutlierResult |
run(Database database,
Relation<O> relation)
Runs the CBLOF algorithm on the given database.
|
private void |
storeCBLOFScore(WritableDoubleDataStore cblofs,
DoubleMinMax cblofMinMax,
double cblof,
DBIDIter iter) |
getDistanceFunction
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
protected ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm
protected double alpha
protected double beta
protected NumberVectorDistanceFunction<? super O extends NumberVector> distance
public CBLOF(NumberVectorDistanceFunction<? super O> distanceFunction, ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm, double alpha, double beta)
distanceFunction
- the neighborhood distance functionclusteringAlgorithm
- the clustering algorithmalpha
- the ratio of the data that should be included in the large
clustersbeta
- the ratio of the sizes of the clusters at the boundary between
the large and the small clusterspublic OutlierResult run(Database database, Relation<O> relation)
database
- Database to queryrelation
- Data to processprivate int getClusterBoundary(Relation<O> relation, java.util.List<? extends Cluster<MeanModel>> clusters)
relation
- Data to processclusters
- All clusters that were foundprivate void computeCBLOFs(Relation<O> relation, NumberVectorDistanceFunction<? super O> distance, WritableDoubleDataStore cblofs, DoubleMinMax cblofMinMax, java.util.List<? extends Cluster<MeanModel>> largeClusters, java.util.List<? extends Cluster<MeanModel>> smallClusters)
relation
- Data to processdistance
- The distance functioncblofs
- CBLOF scorescblofMinMax
- Minimum/maximum score trackerlargeClusters
- Large clusters outputsmallClusters
- Small clusters outputprivate void storeCBLOFScore(WritableDoubleDataStore cblofs, DoubleMinMax cblofMinMax, double cblof, DBIDIter iter)
private double computeSmallClusterCBLOF(O obj, NumberVectorDistanceFunction<? super O> distance, java.util.List<NumberVector> largeClusterMeans, Cluster<MeanModel> cluster)
private double computeLargeClusterCBLOF(O obj, NumberVectorDistanceFunction<? super O> distanceQuery, NumberVector clusterMean, Cluster<MeanModel> cluster)
public TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<OutlierResult>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<OutlierResult>
Copyright © 2019 ELKI Development Team. License information.