@Reference(authors="E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel", title="On Evaluation of Outlier Rankings and Outlier Scores", booktitle="Proc. 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.") public class GreedyEnsembleExperiment extends AbstractApplication
ComputeKNNOutlierScores
, and compute a naive ensemble for it. Based
on this initial estimation, and optimized ensemble is built using a greedy
strategy. Starting with the best candidate only as initial ensemble, the most
diverse candidate is investigated at each step. If it improves towards the
(estimated) target vector, it is added, otherwise it is discarded.
This approach is naive, and it may be surprising that it can improve results.
The reason is probably that diversity will result in a comparable ensemble,
while the reduced ensemble size is actually responsible for the improvements,
by being more decisive and less noisy due to dropping "unhelpful" members.
This still leaves quite a bit of room for improvement. If you build upon this
basic approach, please acknowledge our proof of concept work.
Reference:
E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
In Proceedings of the 12th SIAM International Conference on Data Mining
(SDM), Anaheim, CA, 2012.
Modifier and Type | Class and Description |
---|---|
static class |
GreedyEnsembleExperiment.Distance
Distance modes.
|
static class |
GreedyEnsembleExperiment.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
(package private) GreedyEnsembleExperiment.Distance |
distance
Distance in use.
|
private InputStep |
inputstep
The data input part.
|
private static Logging |
LOG
Get static logger.
|
(package private) int |
minvote
Minimum votes.
|
(package private) ScalingFunction |
prescaling
Outlier scaling to apply during preprocessing.
|
(package private) double |
rate
Expected rate of outliers.
|
(package private) boolean |
refine_truth
Variant, where the truth vector is also updated.
|
(package private) ScalingFunction |
scaling
Outlier scaling to apply to constructed ensembles.
|
(package private) EnsembleVoting |
voting
Ensemble voting method.
|
INFORMATION
Constructor and Description |
---|
GreedyEnsembleExperiment(InputStep inputstep,
EnsembleVoting voting,
GreedyEnsembleExperiment.Distance distance,
ScalingFunction prescaling,
ScalingFunction scaling,
double rate)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static Relation<NumberVector<?>> |
applyPrescaling(ScalingFunction scaling,
Relation<NumberVector<?>> relation,
DBIDs skip)
Prescale each vector (except when in
skip ) with the given scaling
function. |
private static void |
applyScaling(double[] raw,
ScalingFunction scaling) |
(package private) double |
gain(double score,
double ref,
double optimal)
Compute the gain coefficient.
|
private PrimitiveDoubleDistanceFunction<NumberVector<?>> |
getDistanceFunction(double[] estimated_weights) |
static void |
main(String[] args)
Main method.
|
void |
run()
Runs the application.
|
protected void |
singleEnsemble(double[] ensemble,
NumberVector<?> vec)
Build a single-element "ensemble".
|
protected void |
updateEstimations(int[] outliers,
int numoutliers,
double[] weights,
double[] truth) |
printErrorMessage, runCLIApplication, usage
private static final Logging LOG
private InputStep inputstep
boolean refine_truth
EnsembleVoting voting
ScalingFunction prescaling
ScalingFunction scaling
double rate
int minvote
GreedyEnsembleExperiment.Distance distance
public GreedyEnsembleExperiment(InputStep inputstep, EnsembleVoting voting, GreedyEnsembleExperiment.Distance distance, ScalingFunction prescaling, ScalingFunction scaling, double rate)
inputstep
- Input stepvoting
- Ensemble votingdistance
- Distance functionprescaling
- Scaling to apply to input datascaling
- Scaling to apply to ensemble membersrate
- Expected rate of outlierspublic void run()
AbstractApplication
run
in class AbstractApplication
protected void singleEnsemble(double[] ensemble, NumberVector<?> vec)
ensemble
- vec
- public static Relation<NumberVector<?>> applyPrescaling(ScalingFunction scaling, Relation<NumberVector<?>> relation, DBIDs skip)
skip
) with the given scaling
function.scaling
- Scaling functionrelation
- Relation to readskip
- DBIDs to pass unmodifiedprivate static void applyScaling(double[] raw, ScalingFunction scaling)
protected void updateEstimations(int[] outliers, int numoutliers, double[] weights, double[] truth)
private PrimitiveDoubleDistanceFunction<NumberVector<?>> getDistanceFunction(double[] estimated_weights)
double gain(double score, double ref, double optimal)
score
- New scoreref
- Reference scoreoptimal
- Maximum score possiblepublic static void main(String[] args)
args
- Command line parameters.