O - Object type@Title(value="SOS: Stochastic Outlier Selection") @Reference(authors="J. Janssens, F. Husz\u00e1r, E. Postma, J. van den Herik", title="Stochastic Outlier Selection", booktitle="TiCC TR 2012\u2013001", url="https://www.tilburguniversity.edu/upload/b7bac5b2-9b00-402a-9261-7849aa019fbb_sostr.pdf", bibkey="tr/tilburg/JanssensHPv12") public class SOS<O> extends AbstractDistanceBasedAlgorithm<O,OutlierResult> implements OutlierAlgorithm
Reference:
J. Janssens, F. Huszár, E. Postma, J. van den Herik
Stochastic Outlier Selection
TiCC TR 2012–001
| Modifier and Type | Class and Description |
|---|---|
static class |
SOS.Parameterizer<O>
Parameterization class.
|
| Modifier and Type | Field and Description |
|---|---|
private static Logging |
LOG
Class logger.
|
protected double |
perplexity
Perplexity
|
protected static double |
PERPLEXITY_ERROR
Threshold for optimizing perplexity.
|
protected static int |
PERPLEXITY_MAXITER
Maximum number of iterations when optimizing perplexity.
|
ALGORITHM_IDDISTANCE_FUNCTION_ID| Constructor and Description |
|---|
SOS(DistanceFunction<? super O> distance,
double h)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected static double |
computeH(DBIDRef ignore,
DoubleDBIDListIter it,
double[] p,
double mbeta)
Compute H (observed perplexity) for row i, and the row pij_i.
|
static double |
computePi(DBIDRef ignore,
DoubleDBIDListIter it,
double[] p,
double perplexity,
double logPerp)
Compute row p[i], using binary search on the kernel bandwidth sigma to
obtain the desired perplexity.
|
protected static double |
estimateInitialBeta(DBIDRef ignore,
DoubleDBIDListIter it,
double perplexity)
Estimate beta from the distances in a row.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
static void |
nominateNeighbors(DBIDIter ignore,
DBIDArrayIter di,
double[] p,
double norm,
WritableDoubleDataStore scores)
Vote for neighbors not being outliers.
|
OutlierResult |
run(Relation<O> relation)
Run the algorithm.
|
static double |
sumOfProbabilities(DBIDIter ignore,
DBIDArrayIter di,
double[] p)
Compute the sum of probabilities, stop at first 0, ignore query object.
|
getDistanceFunctionrunclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
protected static final double PERPLEXITY_ERROR
protected static final int PERPLEXITY_MAXITER
protected double perplexity
public SOS(DistanceFunction<? super O> distance, double h)
distance - Distance functionh - Perplexitypublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<OutlierResult>public OutlierResult run(Relation<O> relation)
relation - data relationpublic static double sumOfProbabilities(DBIDIter ignore, DBIDArrayIter di, double[] p)
ignore - Object to ignore.di - Object listp - Probabilitiespublic static void nominateNeighbors(DBIDIter ignore, DBIDArrayIter di, double[] p, double norm, WritableDoubleDataStore scores)
ignore - Object to ignoredi - Neighbor object IDs.p - Probabilitiesnorm - Normalization factor (1/sum)scores - Output score storagepublic static double computePi(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double perplexity, double logPerp)
ignore - Object to skipit - Distances iteratorp - Output rowperplexity - Desired perplexitylogPerp - Log of desired perplexity@Reference(authors="Erich Schubert, Michael Gertz", title="Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection: A Remedy Against the Curse of Dimensionality?", booktitle="Proc. Int. Conf. Similarity Search and Applications, SISAP\'2017", url="https://doi.org/10.1007/978-3-319-68474-1_13", bibkey="DBLP:conf/sisap/SchubertG17") protected static double estimateInitialBeta(DBIDRef ignore, DoubleDBIDListIter it, double perplexity)
This lacks a thorough mathematical argument, but is a handcrafted heuristic to avoid numerical problems. The average distance is usually too large, so we scale the average distance by 2*N/perplexity. Then estimate beta as 1/x.
ignore - Object to skipit - Distance iteratorperplexity - Desired perplexityprotected static double computeH(DBIDRef ignore, DoubleDBIDListIter it, double[] p, double mbeta)
ignore - Object to skipit - Distances listp - Output probabilitiesmbeta - -1. / (2 * sigma * sigma)protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<OutlierResult>Copyright © 2019 ELKI Development Team. License information.