@Reference(authors="E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel", title="On Evaluation of Outlier Rankings and Outlier Scores", booktitle="Proc. 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.") public class GreedyEnsembleExperiment extends AbstractApplication
ComputeKNNOutlierScores
, and compute a naive ensemble for it. Based
on this initial estimation, and optimized ensemble is built using a greedy
strategy. Starting with the best candidate only as initial ensemble, the most
diverse candidate is investigated at each step. If it improves towards the
(estimated) target vector, it is added, otherwise it is discarded.
This approach is naive, and it may be surprising that it can improve results.
The reason is probably that diversity will result in a comparable ensemble,
while the reduced ensemble size is actually responsible for the improvements,
by being more decisive and less noisy due to dropping "unhelpful" members.
This still leaves quite a bit of room for improvement. If you build upon this
basic approach, please acknowledge our proof of concept work.
Reference:
E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
In Proceedings of the 12th SIAM International Conference on Data Mining
(SDM), Anaheim, CA, 2012.
Modifier and Type | Class and Description |
---|---|
static class |
GreedyEnsembleExperiment.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private InputStep |
inputstep
The data input part.
|
private static Logging |
LOG
Get static logger.
|
(package private) boolean |
refine_truth
Variant, where the truth vector is also updated.
|
INFORMATION
Constructor and Description |
---|
GreedyEnsembleExperiment(InputStep inputstep)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private double |
computeROCAUC(NumberVector<?> vec,
Set<Integer> positive,
int dim) |
(package private) double |
gain(double score,
double ref,
double optimal)
Compute the gain coefficient.
|
private PrimitiveDoubleDistanceFunction<NumberVector<?>> |
getDistanceFunction(double[] estimated_weights) |
static void |
main(String[] args)
Main method.
|
void |
run()
Runs the application.
|
protected void |
updateEstimations(int[] outliers_seen,
int union_outliers,
double[] estimated_weights,
double[] estimated_truth) |
printErrorMessage, runCLIApplication, usage
private static final Logging LOG
private InputStep inputstep
boolean refine_truth
public GreedyEnsembleExperiment(InputStep inputstep)
inputstep
- Input steppublic void run()
AbstractApplication
run
in class AbstractApplication
protected void updateEstimations(int[] outliers_seen, int union_outliers, double[] estimated_weights, double[] estimated_truth)
private PrimitiveDoubleDistanceFunction<NumberVector<?>> getDistanceFunction(double[] estimated_weights)
private double computeROCAUC(NumberVector<?> vec, Set<Integer> positive, int dim)
double gain(double score, double ref, double optimal)
score
- New scoreref
- Reference scoreoptimal
- Maximum score possiblepublic static void main(String[] args)
args
- Command line parameters.