@Title(value="Data Perturbation for Outlier Detection Ensembles") @Description(value="A filter to perturb a datasset on read by an additive noise component, implemented for use in an outlier ensemble (this reference).") @Reference(authors="A. Zimek, R. J. G. B. Campello, J. Sander", title="Data Perturbation for Outlier Detection Ensembles", booktitle="Proc. 26th International Conference on Scientific and Statistical Database Management (SSDBM), Aalborg, Denmark, 2014", url="http://dx.doi.org/10.1145/2618243.2618257") public class PerturbationFilter<V extends NumberVector> extends AbstractVectorConversionFilter<V,V>
maximumValue - minimumValue
).
This filter has a potentially wide use but has been implemented for the
following publication:
Reference:
A. Zimek, R. J. G. B. Campello, J. Sander: Data Perturbation for Outlier Detection Ensembles.<\br> In: Proc. 26th International Conference on Scientific and Statistical Database Management (SSDBM), Aalborg, Denmark, 2014.
Modifier and Type | Class and Description |
---|---|
static class |
PerturbationFilter.NoiseDistribution
Nature of the noise distribution.
|
static class |
PerturbationFilter.Parameterizer<V extends NumberVector>
Parameterization class.
|
static class |
PerturbationFilter.ScalingReference
Scaling reference options.
|
Modifier and Type | Field and Description |
---|---|
private int |
dimensionality
Stores the dimensionality from the preprocessing.
|
private static Logging |
LOG
Class logger
|
private double[] |
maxima
Stores the maximum in each dimension.
|
private double[] |
minima
Stores the minimum in each dimension.
|
private MeanVarianceMinMax[] |
mvs
Temporary storage used during initialization.
|
private PerturbationFilter.NoiseDistribution |
noisedistribution
Nature of the noise distribution.
|
private double |
percentage
Percentage of the variance of the random noise generation, given the
variance of the corresponding attribute in the data.
|
private Random |
RANDOM
Random object to generate the attribute-wise seeds for the noise.
|
private Random[] |
randomPerAttribute
The random objects to generate noise distributions independently for each
attribute.
|
private PerturbationFilter.ScalingReference |
scalingreference
Which reference to use for scaling the noise.
|
private double[] |
scalingreferencevalues
Stores the scaling reference in each dimension.
|
factory
Constructor and Description |
---|
PerturbationFilter(Long seed,
double percentage,
PerturbationFilter.ScalingReference scalingreference,
double[] minima,
double[] maxima,
PerturbationFilter.NoiseDistribution noisedistribution)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected SimpleTypeInformation<? super V> |
convertedType(SimpleTypeInformation<V> in)
Get the output type from the input type after conversion.
|
protected V |
filterSingleObject(V featureVector)
Normalize a single instance.
|
protected SimpleTypeInformation<? super V> |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Class logger.
|
protected void |
prepareComplete()
Complete the initialization phase.
|
protected void |
prepareProcessInstance(V featureVector)
Process a single object during initialization.
|
protected boolean |
prepareStart(SimpleTypeInformation<V> in)
Return "true" when the normalization needs initialization (two-pass filtering!).
|
initializeOutputType
filter
private static final Logging LOG
private PerturbationFilter.ScalingReference scalingreference
private PerturbationFilter.NoiseDistribution noisedistribution
private final Random RANDOM
private double percentage
private MeanVarianceMinMax[] mvs
private double[] scalingreferencevalues
private Random[] randomPerAttribute
private double[] maxima
private double[] minima
private int dimensionality
public PerturbationFilter(Long seed, double percentage, PerturbationFilter.ScalingReference scalingreference, double[] minima, double[] maxima, PerturbationFilter.NoiseDistribution noisedistribution)
seed
- Seed value, may be null
for a random seed.percentage
- Relative amount of jitter to addscalingreference
- Scaling referenceminima
- Preset minimum values. May be null
.maxima
- Preset maximum values. May be null
.noisedistribution
- Nature of the noise distribution.protected boolean prepareStart(SimpleTypeInformation<V> in)
AbstractConversionFilter
prepareStart
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
in
- Input type informationprotected void prepareProcessInstance(V featureVector)
AbstractConversionFilter
prepareProcessInstance
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
featureVector
- Object to processprotected void prepareComplete()
AbstractConversionFilter
prepareComplete
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
protected SimpleTypeInformation<? super V> getInputTypeRestriction()
AbstractConversionFilter
getInputTypeRestriction
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
protected V filterSingleObject(V featureVector)
AbstractConversionFilter
filterSingleObject
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
featureVector
- Database object to normalizeprotected SimpleTypeInformation<? super V> convertedType(SimpleTypeInformation<V> in)
AbstractConversionFilter
convertedType
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
in
- input type restrictionprotected Logging getLogger()
AbstractConversionFilter
getLogger
in class AbstractConversionFilter<V extends NumberVector,V extends NumberVector>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.