O
- Object type@Title(value="SOS: Stochastic Outlier Selection") @Reference(authors="J. Janssens, F. Husz\u00e1r, E. Postma, J. van den Herik", title="Stochastic Outlier Selection", booktitle="TiCC TR 2012\u2013001", url="https://www.tilburguniversity.edu/upload/b7bac5b2-9b00-402a-9261-7849aa019fbb_sostr.pdf", bibkey="tr/tilburg/JanssensHPv12") public class SOS<O> extends AbstractDistanceBasedAlgorithm<O,OutlierResult> implements OutlierAlgorithm
Reference:
J. Janssens, F. Huszár, E. Postma, J. van den Herik
Stochastic Outlier Selection
TiCC TR 2012–001
Modifier and Type | Class and Description |
---|---|
static class |
SOS.Parameterizer<O>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private static Logging |
LOG
Class logger.
|
protected double |
perplexity
Perplexity
|
protected static double |
PERPLEXITY_ERROR
Threshold for optimizing perplexity.
|
protected static int |
PERPLEXITY_MAXITER
Maximum number of iterations when optimizing perplexity.
|
ALGORITHM_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
SOS(DistanceFunction<? super O> distance,
double h)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected static double |
computeH(DBIDRef ignore,
DoubleDBIDListIter it,
double[] p,
double mbeta)
Compute H (observed perplexity) for row i, and the row pij_i.
|
static double |
computePi(DBIDRef ignore,
DoubleDBIDListIter it,
double[] p,
double perplexity,
double logPerp)
Compute row p[i], using binary search on the kernel bandwidth sigma to
obtain the desired perplexity.
|
protected static double |
estimateInitialBeta(DBIDRef ignore,
DoubleDBIDListIter it,
double perplexity)
Estimate beta from the distances in a row.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
static void |
nominateNeighbors(DBIDIter ignore,
DBIDArrayIter di,
double[] p,
double norm,
WritableDoubleDataStore scores)
Vote for neighbors not being outliers.
|
OutlierResult |
run(Relation<O> relation)
Run the algorithm.
|
static double |
sumOfProbabilities(DBIDIter ignore,
DBIDArrayIter di,
double[] p)
Compute the sum of probabilities, stop at first 0, ignore query object.
|
getDistanceFunction
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
protected static final double PERPLEXITY_ERROR
protected static final int PERPLEXITY_MAXITER
protected double perplexity
public SOS(DistanceFunction<? super O> distance, double h)
distance
- Distance functionh
- Perplexitypublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<OutlierResult>
public OutlierResult run(Relation<O> relation)
relation
- data relationpublic static double sumOfProbabilities(DBIDIter ignore, DBIDArrayIter di, double[] p)
ignore
- Object to ignore.di
- Object listp
- Probabilitiespublic static void nominateNeighbors(DBIDIter ignore, DBIDArrayIter di, double[] p, double norm, WritableDoubleDataStore scores)
ignore
- Object to ignoredi
- Neighbor object IDs.p
- Probabilitiesnorm
- Normalization factor (1/sum)scores
- Output score storagepublic static double computePi(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)
ignore
- Object to skipit
- Distances iteratorp
- Output rowperplexity
- Desired perplexitylogPerp
- Log of desired perplexity@Reference(authors="Erich Schubert, Michael Gertz", title="Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection: A Remedy Against the Curse of Dimensionality?", booktitle="Proc. Int. Conf. Similarity Search and Applications, SISAP\'2017", url="https://doi.org/10.1007/978-3-319-68474-1_13", bibkey="DBLP:conf/sisap/SchubertG17") protected static double estimateInitialBeta(DBIDRef ignore, DoubleDBIDListIter it, double perplexity)
This lacks a thorough mathematical argument, but is a handcrafted heuristic to avoid numerical problems. The average distance is usually too large, so we scale the average distance by 2*N/perplexity. Then estimate beta as 1/x.
ignore
- Object to skipit
- Distance iteratorperplexity
- Desired perplexityprotected static double computeH(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double mbeta)
ignore
- Object to skipit
- Distances listp
- Output probabilitiesmbeta
- -1. / (2 * sigma * sigma)
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<OutlierResult>
Copyright © 2019 ELKI Development Team. License information.