O
- Object type@Title(value="KDEOS: Kernel Density Estimator Outlier Score") @Reference(authors="Erich Schubert, Arthur Zimek, Hans-Peter Kriegel", title="Generalized Outlier Detection with Flexible Kernel Density Estimates", booktitle="Proc. 14th SIAM International Conference on Data Mining (SDM 2014)", url="https://doi.org/10.1137/1.9781611973440.63", bibkey="DBLP:conf/sdm/SchubertZK14") public class KDEOS<O> extends AbstractDistanceBasedAlgorithm<O,OutlierResult> implements OutlierAlgorithm
This is an outlier detection inspired by LOF, but using kernel density estimation (KDE) from statistics. Unfortunately, for higher dimensional data, kernel density estimation itself becomes difficult. At this point, the kdeos.idim parameter can become useful, which allows to either disable dimensionality adjustment completely (0) or to set it to a lower dimensionality than the data representation. This may sound like a hack at first, but real data is often of lower intrinsic dimensionality, and embedded into a higher data representation. Adjusting the kernel to account for the representation seems to yield worse results than using a lower, intrinsic, dimensionality.
If your data set has many duplicates, the kdeos.kernel.minbw parameter sets a minimum kernel bandwidth, which may improve results in these cases, as it prevents kernels from degenerating to single points.
Reference:
Erich Schubert, Arthur Zimek, Hans-Peter Kriegel
Generalized Outlier Detection with Flexible Kernel Density Estimates
Proc. 14th SIAM International Conference on Data Mining (SDM 2014)
Modifier and Type | Class and Description |
---|---|
static class |
KDEOS.Parameterizer<O>
Parameterization class
|
Modifier and Type | Field and Description |
---|---|
(package private) static double |
CUTOFF
Significance cutoff when computing kernel density.
|
(package private) int |
idim
Intrinsic dimensionality.
|
(package private) KernelDensityFunction |
kernel
Kernel function to use for density estimation.
|
(package private) int |
kmax
Minimum and maximum number of neighbors to use.
|
(package private) int |
kmin
Minimum and maximum number of neighbors to use.
|
private static Logging |
LOG
Class logger.
|
(package private) double |
minBandwidth
Kernel minimum bandwidth.
|
(package private) double |
scale
Kernel scaling parameter.
|
ALGORITHM_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
KDEOS(DistanceFunction<? super O> distanceFunction,
int kmin,
int kmax,
KernelDensityFunction kernel,
double minBandwidth,
double scale,
int idim)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
computeOutlierScores(KNNQuery<O> knnq,
DBIDs ids,
WritableDataStore<double[]> densities,
WritableDoubleDataStore kdeos,
DoubleMinMax minmax)
Compute the final KDEOS scores.
|
private int |
dimensionality(Relation<O> rel)
Ugly hack to allow using this implementation without having a well-defined
dimensionality.
|
protected void |
estimateDensities(Relation<O> rel,
KNNQuery<O> knnq,
DBIDs ids,
WritableDataStore<double[]> densities)
Perform the kernel density estimation step.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
OutlierResult |
run(Database database,
Relation<O> rel)
Run the KDEOS outlier detection algorithm.
|
getDistanceFunction
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
KernelDensityFunction kernel
int kmin
int kmax
double scale
double minBandwidth
int idim
static final double CUTOFF
public KDEOS(DistanceFunction<? super O> distanceFunction, int kmin, int kmax, KernelDensityFunction kernel, double minBandwidth, double scale, int idim)
distanceFunction
- Distance functionkmin
- Minimum number of neighborskmax
- Maximum number of neighborskernel
- Kernel functionminBandwidth
- Minimum bandwidthscale
- Kernel scaling parameteridim
- Intrinsic dimensionality (use 0 to use real dimensionality)public OutlierResult run(Database database, Relation<O> rel)
database
- Database to queryrel
- Relation to processprotected void estimateDensities(Relation<O> rel, KNNQuery<O> knnq, DBIDs ids, WritableDataStore<double[]> densities)
rel
- Relation to queryknnq
- kNN queryids
- IDs to processdensities
- Density storageprivate int dimensionality(Relation<O> rel)
rel
- Data relationprotected void computeOutlierScores(KNNQuery<O> knnq, DBIDs ids, WritableDataStore<double[]> densities, WritableDoubleDataStore kdeos, DoubleMinMax minmax)
knnq
- kNN queryids
- IDs to processdensities
- Density estimateskdeos
- Score outputsminmax
- Minimum and maximum scorespublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<OutlierResult>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<OutlierResult>
Copyright © 2019 ELKI Development Team. License information.